Search | arXiv e-print repository

Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark

Authors: Yelleti Vivek, Vadlamani Ravi, P. Radhakrishna

Abstract: Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding area under the ROC curve (AUC) to be maximized. In this research study, we propose a novel multiplicative single objective function involving cardinality and AUC.… ▽ More Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding area under the ROC curve (AUC) to be maximized. In this research study, we propose a novel multiplicative single objective function involving cardinality and AUC. The randomness involved in the Binary Differential Evolution (BDE) may yield less diverse solutions thereby getting trapped in local minima. Hence, we embed Logistic and Tent chaotic maps into the BDE and named it as Chaotic Binary Differential Evolution (CBDE). Designing a scalable solution to the FSS is critical when dealing with high-dimensional and voluminous datasets. Hence, we propose a scalable island (iS) based parallelization approach where the data is divided into multiple partitions/islands thereby the solution evolves individually and gets combined eventually in a migration strategy. The results empirically show that the proposed parallel Chaotic Binary Differential Evolution (P-CBDE-iS) is able to find the better quality feature subsets than the Parallel Bi-nary Differential Evolution (P-BDE-iS). Logistic Regression (LR) is used as a classifier owing to its simplicity and power. The speedup attained by the proposed parallel approach signifies the importance. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 14 pages; 3 figures; 6 tables

MSC Class: 68T09 ACM Class: I.2

arXiv:2106.14007 [pdf]

doi 10.1007/s10586-022-03725-w

Scalable Feature Subset Selection for Big Data using Parallel Hybrid Evolutionary Algorithm based Wrapper in Apache Spark

Authors: Yelleti Vivek, Vadlamani Ravi, Pisipati Radhakrishna

Abstract: Owing to the emergence of large datasets, applying current sequential wrapper-based feature subset selection (FSS) algorithms increases the complexity. This limitation motivated us to propose a wrapper for feature subset selection (FSS) based on parallel and distributed hybrid evolutionary algorithms (EAs) under the Apache Spark environment. The hybrid EAs are based on the BDE and Binary Threshold… ▽ More Owing to the emergence of large datasets, applying current sequential wrapper-based feature subset selection (FSS) algorithms increases the complexity. This limitation motivated us to propose a wrapper for feature subset selection (FSS) based on parallel and distributed hybrid evolutionary algorithms (EAs) under the Apache Spark environment. The hybrid EAs are based on the BDE and Binary Threshold Accepting (BTA), a point-based EA, which is invoked to enhance the search capability and avoid premature convergence of the PB-DE. Thus, we designed the hybrid variants (i) parallel binary differential evolution and threshold accepting (PB-DETA), where DE and TA work in tandem in every iteration, and (ii) parallel binary threshold accepting and differential evolution (PB-TADE), where TA and DE work in tandem in every iteration under the Apache Spark environment. Both PB-DETA and PB-TADE are compared with the baseline, viz., the parallel version of the binary differential evolution (PB-DE). All three proposed approaches use logistic regression (LR) to compute the fitness function, namely, the area under ROC curve (AUC). The effectiveness of the proposed algorithms is tested over the five large datasets of varying feature space dimension, taken from cyber security and biology domains. It is noteworthy that the PB-TADE turned out to be statistically significant compared to PB-DE and PB-DETA. We reported the speedup analysis, average AUC obtained by the most repeated feature subset, feature subset with high AUC and least cardinality. △ Less

Submitted 25 January, 2022; v1 submitted 26 June, 2021; originally announced June 2021.

Comments: 28 pages, 10 Tables and 6 figures

MSC Class: 68W50 ACM Class: I.2

arXiv:1910.08861 [pdf]

A Novel Scheme of Digital Instantaneous Automatic Gain Control (DIAGC) for Pulse Radars

Authors: Sumanta Pal, Nirmala Shanmugam, Mohit Kumar, P Radhakrishna

Abstract: Several schemes for gain control are used for preventing saturation of receiver, and overloading of data processor, tracker or display in pulse radars. The use of digital processing techniques open the door to a variety of digital automatic gain control schemes for analyzing digitized return signals and controlling receiver gain only at saturating clutter zones without affecting the detection at o… ▽ More Several schemes for gain control are used for preventing saturation of receiver, and overloading of data processor, tracker or display in pulse radars. The use of digital processing techniques open the door to a variety of digital automatic gain control schemes for analyzing digitized return signals and controlling receiver gain only at saturating clutter zones without affecting the detection at other zones. In this paper, we present a novel scheme of Digital Instantaneous Automatic Gain Control (DIAGC) which is based on storing digitally the dwell based clutter returns and deriving the gain control. The returns corresponding to the first two PRTs in a dwell are used to analyze the presence of saturating clutter zones and the depth of saturation. Third PRT onwards proper gain control is applied at the IF stage to prevent saturation of the following stages. FPGA based scheme is used for digital data processing, storing, threshold calculation and gain control generation. The effect of DIAGC on pulse compression is also addressed in this paper. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Comments: Presented at International Symposium of India 2011

Showing 1–3 of 3 results for author: Radhakrishna, P