-
KmerCo: A lightweight K-mer counting technique with a tiny memory footprint
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive proc…
▽ More
K-mer counting is a requisite process for DNA assembly because it speeds up its overall process. The frequency of K-mers is used for estimating the parameters of DNA assembly, error correction, etc. The process also provides a list of district K-mers which assist in searching large databases and reducing the size of de Bruijn graphs. Nonetheless, K-mer counting is a data and compute-intensive process. Hence, it is crucial to implement a lightweight data structure that occupies low memory but does fast processing of K-mers. We proposed a lightweight K-mer counting technique, called KmerCo that implements a potent counting Bloom Filter variant, called countBF. KmerCo has two phases: insertion and classification. The insertion phase inserts all K-mers into countBF and determines distinct K-mers. The classification phase is responsible for the classification of distinct K-mers into trustworthy and erroneous K-mers based on a user-provided threshold value. We also proposed a novel benchmark performance metric. We used the Hadoop MapReduce program to determine the frequency of K-mers. We have conducted rigorous experiments to prove the dominion of KmerCo compared to state-of-the-art K-mer counting techniques. The experiments are conducted using DNA sequences of four organisms. The datasets are pruned to generate four different size datasets. KmerCo is compared with Squeakr, BFCounter, and Jellyfish. KmerCo took the lowest memory, highest number of insertions per second, and a positive trustworthy rate as compared with the three above-mentioned methods.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
An Investigation on Learning, Polluting, and Unlearning the Spam Emails for Lifelong Learning
Authors:
Nishchal Parne,
Kyathi Puppaala,
Nithish Bhupathi,
Ripon Patgiri
Abstract:
Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the…
▽ More
Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1%. This problem can be solved by develo** unlearning frameworks for all spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes, Decision trees, and Random Forests algorithms. To assess the benefits of unlearning over retraining, three spam detection models are polluted and exploited by taking attackers' positions and proving models' vulnerability. Reduction in accuracy and true positive rates are shown in each case showing the effect of pollution on models. Then unlearning modules are integrated into the models, and polluted data is unlearned; on testing the models after unlearning, restoration of performance is seen. Also, unlearning and retraining times are compared with different pollution data sizes on all models. On analyzing the findings, it can be concluded that unlearning is considerably superior to retraining. Results show that unlearning is fast, easy to implement, easy to use, and effective.
△ Less
Submitted 24 December, 2021; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Graph Neural Networks: Methods, Applications, and Opportunities
Authors:
Lilapati Waikhom,
Ripon Patgiri
Abstract:
In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space,…
▽ More
In the last decade or so, we have witnessed deep learning reinvigorating the machine learning field. It has solved many problems in the domains of computer vision, speech recognition, natural language processing, and various other tasks with state-of-the-art performance. The data is generally represented in the Euclidean space in these domains. Various other domains conform to non-Euclidean space, for which graph is an ideal representation. Graphs are suitable for representing the dependencies and interrelationships between various entities. Traditionally, handcrafted features for graphs are incapable of providing the necessary inference for various tasks from this complex data representation. Recently, there is an emergence of employing various advances in deep learning to graph data-based tasks. This article provides a comprehensive survey of graph neural networks (GNNs) in each learning setting: supervised, unsupervised, semi-supervised, and self-supervised learning. Taxonomy of each graph based learning setting is provided with logical divisions of methods falling in the given learning setting. The approaches for each learning task are analyzed from both theoretical as well as empirical standpoints. Further, we provide general architecture guidelines for building GNNs. Various applications and benchmark datasets are also provided, along with open challenges still plaguing the general applicability of GNNs.
△ Less
Submitted 8 September, 2021; v1 submitted 24 August, 2021;
originally announced August 2021.
-
A Review on Edge Analytics: Issues, Challenges, Opportunities, Promises, Future Directions, and Applications
Authors:
Sabuzima Nayak,
Ripon Patgiri,
Lilapati Waikhom,
Arif Ahmed
Abstract:
Edge technology aims to bring Cloud resources (specifically, the compute, storage, and network) to the closed proximity of the Edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in Edge devices lead to emerging of two new concepts in Edge technology, namely, Edge computing and Edge analytics. Edge analytics uses some techniques or algori…
▽ More
Edge technology aims to bring Cloud resources (specifically, the compute, storage, and network) to the closed proximity of the Edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in Edge devices lead to emerging of two new concepts in Edge technology, namely, Edge computing and Edge analytics. Edge analytics uses some techniques or algorithms to analyze the data generated by the Edge devices. With the emerging of Edge analytics, the Edge devices have become a complete set. Currently, Edge analytics is unable to provide full support for the execution of the analytic techniques. The Edge devices cannot execute advanced and sophisticated analytic algorithms following various constraints such as limited power supply, small memory size, limited resources, etc. This article aims to provide a detailed discussion on Edge analytics. A clear explanation to distinguish between the three concepts of Edge technology, namely, Edge devices, Edge computing, and Edge analytics, along with their issues. Furthermore, the article discusses the implementation of Edge analytics to solve many problems in various areas such as retail, agriculture, industry, and healthcare. In addition, the research papers of the state-of-the-art edge analytics are rigorously reviewed in this article to explore the existing issues, emerging challenges, research opportunities and their directions, and applications.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
RobustBF: A High Accuracy and Memory Efficient 2D Bloom Filter
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
Bloom Filter is an important probabilistic data structure to reduce memory consumption for membership filters. It is applied in diverse domains such as Computer Networking, Network Security and Privacy, IoT, Edge Computing, Cloud Computing, Big Data, and Biometrics. But Bloom Filter has an issue of the false positive probability. To address this issue, we propose a novel robust Bloom Filter, robus…
▽ More
Bloom Filter is an important probabilistic data structure to reduce memory consumption for membership filters. It is applied in diverse domains such as Computer Networking, Network Security and Privacy, IoT, Edge Computing, Cloud Computing, Big Data, and Biometrics. But Bloom Filter has an issue of the false positive probability. To address this issue, we propose a novel robust Bloom Filter, robustBF for short. robustBF is a 2D Bloom Filter, capable of filtering millions of data with high accuracy without compromising the performance. Our proposed system is presented in two-fold. Firstly, we modify the murmur hash function, and test all modified hash functions for improvements and select the best-modified hash function experimentally. Secondly, we embed the modified hash functions in 2D Bloom Filter. Our experimental results show that robustBF is better than standard Bloom Filter and counting Bloom Filter in every aspect. robustBF exhibits nearly zero false positive probability with more than $10\times$ and $44\times$ lower memory consumption than standard Bloom filter and counting Bloom Filter, respectively.
△ Less
Submitted 8 September, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.
-
countBF: A General-purpose High Accuracy and Space Efficient Counting Bloom Filter
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
Bloom Filter is a probabilistic data structure for the membership query, and it has been intensely experimented in various fields to reduce memory consumption and enhance a system's performance. Bloom Filter is classified into two key categories: counting Bloom Filter (CBF), and non-counting Bloom Filter. CBF has a higher false positive probability than standard Bloom Filter (SBF), i.e., CBF uses…
▽ More
Bloom Filter is a probabilistic data structure for the membership query, and it has been intensely experimented in various fields to reduce memory consumption and enhance a system's performance. Bloom Filter is classified into two key categories: counting Bloom Filter (CBF), and non-counting Bloom Filter. CBF has a higher false positive probability than standard Bloom Filter (SBF), i.e., CBF uses a higher memory footprint than SBF. But CBF can address the issue of the false negative probability. Notably, SBF is also false negative free, but it cannot support delete operations like CBF. To address these issues, we present a novel counting Bloom Filter based on SBF and 2D Bloom Filter, called countBF. countBF uses a modified murmur hash function to enhance its various requirements, which is experimentally evaluated. Our experimental results show that countBF uses $1.96\times$ and $7.85\times$ less memory than SBF and CBF respectively, while preserving lower false positive probability and execution time than both SBF and CBF. The overall accuracy of countBF is $99.999921$, and it proves the superiority of countBF over SBF and CBF. Also, we compare with other state-of-the-art counting Bloom Filters.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
DeepBF: Malicious URL detection using Learned Bloom Filter and Evolutionary Deep Learning
Authors:
Ripon Patgiri,
Anupam Biswas,
Sabuzima Nayak
Abstract:
Malicious URL detection is an emerging research area due to continuous modernization of various systems, for instance, Edge Computing. In this article, we present a novel malicious URL detection technique, called deepBF (deep learning and Bloom Filter). deepBF is presented in two-fold. Firstly, we propose a learned Bloom Filter using 2-dimensional Bloom Filter. We experimentally decide the best no…
▽ More
Malicious URL detection is an emerging research area due to continuous modernization of various systems, for instance, Edge Computing. In this article, we present a novel malicious URL detection technique, called deepBF (deep learning and Bloom Filter). deepBF is presented in two-fold. Firstly, we propose a learned Bloom Filter using 2-dimensional Bloom Filter. We experimentally decide the best non-cryptography string hash function. Then, we derive a modified non-cryptography string hash function from the selected hash function for deepBF by introducing biases in the hashing method and compared among the string hash functions. The modified string hash function is compared to other variants of diverse non-cryptography string hash functions. It is also compared with various filters, particularly, counting Bloom Filter, Kirsch \textit{et al.}, and Cuckoo Filter using various use cases. The use cases unearth weakness and strength of the filters. Secondly, we propose a malicious URL detection mechanism using deepBF. We apply the evolutionary convolutional neural network to identify the malicious URLs. The evolutionary convolutional neural network is trained and tested with malicious URL datasets. The output is tested in deepBF for accuracy. We have achieved many conclusions from our experimental evaluation and results and are able to reach various conclusive decisions which are presented in the article.
△ Less
Submitted 26 February, 2022; v1 submitted 18 March, 2021;
originally announced March 2021.
-
6G Communication Technology: A Vision on Intelligent Healthcare
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
6G is a promising communication technology that will dominate the entire health market from 2030 onward. It will dominate not only health sector but also diverse sectors. It is expected that 6G will revolutionize many sectors including healthcare. Healthcare will be fully AI-driven and dependent on 6G communication technology, which will change our perception of lifestyle. Currently, time and spac…
▽ More
6G is a promising communication technology that will dominate the entire health market from 2030 onward. It will dominate not only health sector but also diverse sectors. It is expected that 6G will revolutionize many sectors including healthcare. Healthcare will be fully AI-driven and dependent on 6G communication technology, which will change our perception of lifestyle. Currently, time and space are the key barriers to health care and 6G will be able to overcome these barriers. Also, 6G will be proven as a game changing technology for healthcare. Therefore, in this perspective, we envision healthcare system for the era of 6G communication technology. Also, various new methodologies have to be introduced to enhance our lifestyle, which is addressed in this perspective, including Quality of Life (QoL), Intelligent Wearable Devices (IWD), Intelligent Internet of Medical Things (IIoMT), Hospital-to-Home (H2H) services, and new business model. In addition, we expose the role of 6G communication technology in telesurgery, Epidemic and Pandemic.
△ Less
Submitted 16 April, 2020;
originally announced May 2020.
-
6G Communications: A Vision on the Potential Applications
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
6G communication technology is a revolutionary technology that will revolutionize many technologies and applications. Furthermore, it will be truly AI-driven and will carry on intelligent space. Hence, it will enable Internet of Everything (IoE) which will also impact many technologies and applications. 6G communication technology promises high Quality of Services (QoS) and high Quality of Experie…
▽ More
6G communication technology is a revolutionary technology that will revolutionize many technologies and applications. Furthermore, it will be truly AI-driven and will carry on intelligent space. Hence, it will enable Internet of Everything (IoE) which will also impact many technologies and applications. 6G communication technology promises high Quality of Services (QoS) and high Quality of Experiences (QoE). With the combination of IoE and 6G communication technology, number of applications will be exploded in the coming future, particularly, vehicles, drones, homes, cities, hospitals, and so on, and there will be no untouched area. Thence, it is expected that many existing technologies will fully depend on 6G communication technology and enhance their performances. 6G communication technology will prove as game changer communication technology in many fields and will be capable to influence many applications. Therefore, we envision the potential applications of 6G communication technology in the near future.
△ Less
Submitted 23 April, 2020;
originally announced May 2020.
-
A Review on Impact of Bloom Filter on Named Data Networking: The Future Internet Architecture
Authors:
Sabuzima Nayak,
Ripon Patgiri,
Angana Borah
Abstract:
Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the…
▽ More
Today is the era of smart devices. Through the smart devices, people remain connected with systems across the globe even in mobile state. Hence, the current Internet is facing scalability issue. Therefore, leaving IP based Internet behind due to scalability, the world is moving to the Future Internet Architecture, called Named Data Networking (NDN). Currently, the number of nodes connected to the Internet is in billions. And, the number of requests sent is in millions per second. NDN handles such huge numbers by modifying the IP architecture to meet the current requirements. NDN is scalable, produces less traffic and congestion, provides high level security, saves bandwidth, efficiently utilizes multiple network interfaces and have many more functionalities. Similarly, Bloom Filter is the only good choice to deploy in various modules of NDN to handle the huge number of packets. Bloom Filter is a simple probabilistic data structure for the membership query. This article presents a detailed discussion on the role of Bloom Filter in implementing NDN. The article includes a precise discussion on Bloom Filter and the main components of the NDN architecture, namely, packet, content store, forward information base and pending interest table are also discussed briefly.
△ Less
Submitted 7 April, 2020;
originally announced May 2020.
-
Big Computing: Where are we heading?
Authors:
Sabuzima Nayak,
Ripon Patgiri,
Thoudam Doren Singh
Abstract:
This paper presents the overview of the current trends of Big data against the computing scenario from different aspects. Some of the important aspect includes the Exascale, the computing power and the kind of applications which offer the Big data. This starts with the current computing hardware constraint against the need of the rising Big data applications. We highlight the issues and challenges…
▽ More
This paper presents the overview of the current trends of Big data against the computing scenario from different aspects. Some of the important aspect includes the Exascale, the computing power and the kind of applications which offer the Big data. This starts with the current computing hardware constraint against the need of the rising Big data applications. We highlight the issues and challenges of energy requirement, software complexity, hardware failure, fault tolerant computing, and communication. As the complexity of computation is going to rise in the future. The paper also highlights the future direction of Big computing systems for Bioinformatics, social media, hardware and software requirements, data intensive computation and then towards GPU era.
△ Less
Submitted 9 April, 2020;
originally announced May 2020.
-
A Survey on Large Scale Metadata Server for Big Data Storage
Authors:
Ripon Patgiri,
Sabuzima Nayak
Abstract:
Big Data is defined as high volume of variety of data with an exponential data growth rate. Data are amalgamated to generate revenue, which results a large data silo. Data are the oils of modern IT industries. Therefore, the data are growing at an exponential pace. The access mechanism of these data silos are defined by metadata. The metadata are decoupled from data server for various beneficial r…
▽ More
Big Data is defined as high volume of variety of data with an exponential data growth rate. Data are amalgamated to generate revenue, which results a large data silo. Data are the oils of modern IT industries. Therefore, the data are growing at an exponential pace. The access mechanism of these data silos are defined by metadata. The metadata are decoupled from data server for various beneficial reasons. For instance, ease of maintenance. The metadata are stored in metadata server (MDS). Therefore, the study on the MDS is mandatory in designing of a large scale storage system. The MDS requires many parameters to augment with its architecture. The architecture of MDS depends on the demand of the storage system's requirements. Thus, MDS is categorized in various ways depending on the underlying architecture and design methodology. The article surveys on the various kinds of MDS architecture, designs, and methodologies. This article emphasizes on clustered MDS (cMDS) and the reports are prepared based on a) Bloom filter$-$based MDS, b) Client$-$funded MDS, c) Geo$-$aware MDS, d) Cache$-$aware MDS, e) Load$-$aware MDS, f) Hash$-$based MDS, and g) Tree$-$based MDS. Additionally, the article presents the issues and challenges of MDS for mammoth sized data.
△ Less
Submitted 11 April, 2020;
originally announced May 2020.
-
6G Communication: Envisioning the Key Issues and Challenges
Authors:
Sabuzima Nayak,
Ripon Patgiri
Abstract:
In 2030, we are going to evidence the 6G mobile communication technology, which will enable the Internet of Everything. Yet 5G has to be experienced by people worldwide and B5G has to be developed; the researchers have already started planning, visioning, and gathering requirements of the 6G. Moreover, many countries have already initiated the research on 6G. 6G promises connecting every smart dev…
▽ More
In 2030, we are going to evidence the 6G mobile communication technology, which will enable the Internet of Everything. Yet 5G has to be experienced by people worldwide and B5G has to be developed; the researchers have already started planning, visioning, and gathering requirements of the 6G. Moreover, many countries have already initiated the research on 6G. 6G promises connecting every smart device to the Internet from smartphone to intelligent vehicles. 6G will provide sophisticated and high QoS such as holographic communication, augmented reality/virtual reality and many more. Also, it will focus on Quality of Experience (QoE) to provide rich experiences from 6G technology. Notably, it is very important to vision the issues and challenges of 6G technology, otherwise, promises may not be delivered on time. The requirements of 6G poses new challenges to the research community. To achieve desired parameters of 6G, researchers are exploring various alternatives. Hence, there are diverse research challenges to envision, from devices to softwarization. Therefore, in this article, we discuss the future issues and challenges to be faced by the 6G technology. We have discussed issues and challenges from every aspect from hardware to the enabling technologies which will be utilized by 6G.
△ Less
Submitted 7 June, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Empirical Study on Airline Delay Analysis and Prediction
Authors:
Ripon Patgiri,
Sajid Hussain,
Aditya Nongmeikapam
Abstract:
The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, air…
▽ More
The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Shed More Light on Bloom Filter's Variants
Authors:
Ripon Patgiri,
Sabuzima Nayak,
Samir Kumar Borgohain
Abstract:
Bloom Filter is a probabilistic membership data structure and it is excessively used data structure for membership query. Bloom Filter becomes the predominant data structure in approximate membership filtering. Bloom Filter extremely enhances the query response time, and the response time is very fast. Bloom filter (BF) is used to detect whether an element belongs to a given set or not. The Bloom…
▽ More
Bloom Filter is a probabilistic membership data structure and it is excessively used data structure for membership query. Bloom Filter becomes the predominant data structure in approximate membership filtering. Bloom Filter extremely enhances the query response time, and the response time is very fast. Bloom filter (BF) is used to detect whether an element belongs to a given set or not. The Bloom Filter returns True Positive (TP), False Positive (FP), or True Negative (TN). The Bloom Filter is widely adapted in numerous areas to enhance the performance of a system. In this paper, we present a) in-depth insight on the Bloom Filter,and b) the prominent variants of the Bloom Filters.
△ Less
Submitted 17 March, 2019;
originally announced March 2019.
-
Machine Learning: A Dark Side of Cancer Computing
Authors:
Ripon Patgiri,
Sabuzima Nayak,
Tanya Akutota,
Bishal Paul
Abstract:
Cancer analysis and prediction is the utmost important research field for well-being of humankind. The Cancer data are analyzed and predicted using machine learning algorithms. Most of the researcher claims the accuracy of the predicted results within 99%. However, we show that machine learning algorithms can easily predict with an accuracy of 100% on Wisconsin Diagnostic Breast Cancer dataset. We…
▽ More
Cancer analysis and prediction is the utmost important research field for well-being of humankind. The Cancer data are analyzed and predicted using machine learning algorithms. Most of the researcher claims the accuracy of the predicted results within 99%. However, we show that machine learning algorithms can easily predict with an accuracy of 100% on Wisconsin Diagnostic Breast Cancer dataset. We show that the method of gaining accuracy is an unethical approach that we can easily mislead the algorithms. In this paper, we exploit the weakness of Machine Learning algorithms. We perform extensive experiments for the correctness of our results to exploit the weakness of machine learning algorithms. The methods are rigorously evaluated to validate our claim. In addition, this paper focuses on correctness of accuracy. This paper report three key outcomes of the experiments, namely, correctness of accuracies, significance of minimum accuracy, and correctness of machine learning algorithms.
△ Less
Submitted 17 March, 2019;
originally announced March 2019.
-
scaleBF: A High Scalable Membership Filter using 3D Bloom Filter
Authors:
Ripon Patgiri,
Sabuzima Nayak,
Samir Kumar Borgohain
Abstract:
Bloom Filter is extensively deployed data structure in various applications and research domain since its inception. Bloom Filter is able to reduce the space consumption in an order of magnitude. Thus, Bloom Filter is used to keep information of a very large scale data. There are numerous variants of Bloom Filters available, however, scalability is a serious dilemma of Bloom Filter for years. To s…
▽ More
Bloom Filter is extensively deployed data structure in various applications and research domain since its inception. Bloom Filter is able to reduce the space consumption in an order of magnitude. Thus, Bloom Filter is used to keep information of a very large scale data. There are numerous variants of Bloom Filters available, however, scalability is a serious dilemma of Bloom Filter for years. To solve this dilemma, there are also diverse variants of Bloom Filter. However, the time complexity and space complexity become the key issue again. In this paper, we present a novel Bloom Filter to address the scalability issue without compromising the performance, called scaleBF. scaleBF deploys many 3D Bloom Filter to filter the set of items. In this paper, we theoretically compare the contemporary Bloom Filter for scalability and scaleBF outperforms in terms of time complexity.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Role of Bloom Filter in Big Data Research: A Survey
Authors:
Ripon Patgiri,
Sabuzima Nayak,
Samir Kumar Borgohain
Abstract:
Big Data is the most popular emerging trends that becomes a blessing for human kinds and it is the necessity of day-to-day life. For example, Facebook. Every person involves with producing data either directly or indirectly. Thus, Big Data is a high volume of data with exponential growth rate that consists of a variety of data. Big Data touches all fields, including Government sector, IT industry,…
▽ More
Big Data is the most popular emerging trends that becomes a blessing for human kinds and it is the necessity of day-to-day life. For example, Facebook. Every person involves with producing data either directly or indirectly. Thus, Big Data is a high volume of data with exponential growth rate that consists of a variety of data. Big Data touches all fields, including Government sector, IT industry, Business, Economy, Engineering, Bioinformatics, and other basic sciences. Thus, Big Data forms a data silo. Most of the data are duplicates and unstructured. To deal with such kind of data silo, Bloom Filter is a precious resource to filter out the duplicate data. Also, Bloom Filter is inevitable in a Big Data storage system to optimize the memory consumption. Undoubtedly, Bloom Filter uses a tiny amount of memory space to filter a very large data size and it stores information of a large set of data. However, functionality of the Bloom Filter is limited to membership filter, but it can be adapted in various applications. Besides, the Bloom Filter is deployed in diverse field, and also used in the interdisciplinary research area. Bioinformatics, for instance. In this article, we expose the usefulness of Bloom Filter in Big Data research.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Preventing DDoS using Bloom Filter: A Survey
Authors:
Ripon Patgiri,
Sabuzima Nayak,
Samir Kumar Borgohain
Abstract:
Distributed Denial-of-Service (DDoS) is a menace for service provider and prominent issue in network security. Defeating or defending the DDoS is a prime challenge. DDoS make a service unavailable for a certain time. This phenomenon harms the service providers, and hence, loss of business revenue. Therefore, DDoS is a grand challenge to defeat. There are numerous mechanism to defend DDoS, however,…
▽ More
Distributed Denial-of-Service (DDoS) is a menace for service provider and prominent issue in network security. Defeating or defending the DDoS is a prime challenge. DDoS make a service unavailable for a certain time. This phenomenon harms the service providers, and hence, loss of business revenue. Therefore, DDoS is a grand challenge to defeat. There are numerous mechanism to defend DDoS, however, this paper surveys the deployment of Bloom Filter in defending a DDoS attack. The Bloom Filter is a probabilistic data structure for membership query that returns either true or false. Bloom Filter uses tiny memory to store information of large data. Therefore, packet information is stored in Bloom Filter to defend and defeat DDoS. This paper presents a survey on DDoS defending technique using Bloom Filter.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
A Taxonomy on Big Data: Survey
Authors:
Ripon Patgiri
Abstract:
The Big Data is the most popular paradigm nowadays and it has almost no untouched area. For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset. The Data are assets of the organization, and these data gives revenue to the organizations. Therefore, the Big Data is spawning…
▽ More
The Big Data is the most popular paradigm nowadays and it has almost no untouched area. For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset. The Data are assets of the organization, and these data gives revenue to the organizations. Therefore, the Big Data is spawning everywhere to enhance the organizations' revenue. Thus, many new technologies emerging based on Big Data. In this paper, we present the taxonomy of Big Data. Besides, we present in-depth insight on the Big Data paradigm.
△ Less
Submitted 25 November, 2019; v1 submitted 25 August, 2018;
originally announced August 2018.
-
MapReduce Scheduler: A 360-degree view
Authors:
Rajdeep Das,
Rohit Pratap Singh,
Ripon Patgiri
Abstract:
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling algorit…
▽ More
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss based on their characteristics. Moreover, there are many shortcoming to discover in this field. In this article, we present the state-of-the-art scheduling algorithm to enhance the understanding of the algorithms. The algorithms are presented systematically such that there can be many future possibilities in scheduling algorithm through this article. In this paper, we provide in-depth insight on the MapReduce scheduling algorithm. In addition, we discuss various issues of MapReduce scheduler developed for large-scale computing as well as heterogeneous environment.
△ Less
Submitted 9 April, 2017;
originally announced April 2017.