Search | arXiv e-print repository

doi 10.5755/j01.itc.51.1.29826

Multivariate Microaggregation of Set-Valued Data

Authors: Malik Imran-Daud, Muhammad Shaheen, Abbas Ahmed

Abstract: Data controllers manage immense data, and occasionally, it is released publically to help the researchers to conduct their studies. However, this publically shared data may hold personally identifiable information (PII) that can be collected to re-identify a person. Therefore, an effective anonymization mechanism is required to anonymize such data before it is released publically. Microaggregation… ▽ More Data controllers manage immense data, and occasionally, it is released publically to help the researchers to conduct their studies. However, this publically shared data may hold personally identifiable information (PII) that can be collected to re-identify a person. Therefore, an effective anonymization mechanism is required to anonymize such data before it is released publically. Microaggregation is one of the Statistical Disclosure Control (SDC) methods that are widely used by many researchers. This method adapts the k-anonymity principle to generate k-indistinguishable records in the same clusters to preserve the privacy of the individuals. However, in these methods, the size of the clusters is fixed (i.e., k records), and the clusters generated through these methods may hold non-homogeneous records. By considering these issues, we propose an adaptive size clustering technique that aggregates homogeneous records in similar clusters, and the size of the clusters is determined after the semantic analysis of the records. To achieve this, we extend the MDAV microaggregation algorithm to semantically analyze the unstructured records by relying on the taxonomic databases (i.e., WordNet), and then aggregating them in homogeneous clusters. Furthermore, we propose a distance measure that determines the extent to which the records differ from each other, and based on this, homogeneous adaptive clusters are constructed. In experiments, we measured the cohesiveness of the clusters in order to gauge the homogeneity of records. In addition, a method is proposed to measure information loss caused by the redaction method. In experiments, the results show that the proposed mechanism outperforms the existing state-of-the-art solutions. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Journal ref: Information Technology and Control, 51(1), 104-125, 2022

arXiv:2203.04111 [pdf, other]

Plumeria at SemEval-2022 Task 6: Robust Approaches for Sarcasm Detection for English and Arabic Using Transformers and Data Augmentation

Authors: Shubham Kumar Nigam, Mosab Shaheen

Abstract: This paper describes our submission to SemEval-2022 Task 6 on sarcasm detection and its five subtasks for English and Arabic. Sarcasm conveys a meaning which contradicts the literal meaning, and it is mainly found on social networks. It has a significant role in understanding the intention of the user. For detecting sarcasm, we used deep learning techniques based on transformers due to its success… ▽ More This paper describes our submission to SemEval-2022 Task 6 on sarcasm detection and its five subtasks for English and Arabic. Sarcasm conveys a meaning which contradicts the literal meaning, and it is mainly found on social networks. It has a significant role in understanding the intention of the user. For detecting sarcasm, we used deep learning techniques based on transformers due to its success in the field of Natural Language Processing (NLP) without the need for feature engineering. The datasets were taken from tweets. We created new datasets by augmenting with external data or by using word embeddings and repetition of instances. Experiments were done on the datasets with different types of preprocessing because it is crucial in this task. The rank of our team was consistent across four subtasks (fourth rank in three subtasks and sixth rank in one subtask); whereas other teams might be in the top ranks for some subtasks but rank drastically less in other subtasks. This implies the robustness and stability of the models and the techniques we used. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: SemEval-2022 workshop paper, submitted in NAACL-2022 conference. 8 figures and 29 tables. 8 main pages, 4 appendix pages

arXiv:2111.10776 [pdf]

A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

Authors: Fardin Saad, Hasan Mahmud, Mohammad Ridwan Kabir, Md. Alamin Shaheen, Paresha Farastu, Md. Kamrul Hasan

Abstract: A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neut… ▽ More A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neutral, sad, disgust, and fear. We employed three Emotional Speech Sets (ESS), of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was a subset of the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually, followed by the second experiment, where different ESS pairs were integrated to analyze the impact on SER. Finally, we measured the recognition rate by training and testing the model with different speech sets in the third experiment. Although this study reveals that SER in Bangla and English languages is mostly language-independent, some disparities were observed while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations revealed that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue. △ Less

Submitted 13 May, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

Comments: 13 pages [currently under review]

arXiv:1210.7650 [pdf]

Adaptive Layered Approach using Machine Learning Techniques with Gain Ratio for Intrusion Detection Systems

Authors: Heba Ezzat Ibrahim, Sherif M. Badr, Mohamed A. Shaheen

Abstract: Intrusion Detection System (IDS) has increasingly become a crucial issue for computer and network systems. Optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, A multi-layer intrusion detection model is designed and developed to achieve high efficiency and improve the detection and classification rate accu… ▽ More Intrusion Detection System (IDS) has increasingly become a crucial issue for computer and network systems. Optimizing performance of IDS becomes an important open problem which receives more and more attention from the research community. In this work, A multi-layer intrusion detection model is designed and developed to achieve high efficiency and improve the detection and classification rate accuracy .we effectively apply Machine learning techniques (C5 decision tree, Multilayer Perceptron neural network and Naïve Bayes) using gain ratio for selecting the best features for each layer as to use smaller storage space and get higher Intrusion detection performance. Our experimental results showed that the proposed multi-layer model using C5 decision tree achieves higher classification rate accuracy, using feature selection by Gain Ratio, and less false alarm rate than MLP and naïve Bayes. Using Gain Ratio enhances the accuracy of U2R and R2L for the three machine learning techniques (C5, MLP and Naïve Bayes) significantly. MLP has high classification rate when using the whole 41 features in Dos and Probe layers. △ Less

Submitted 29 October, 2012; originally announced October 2012.

Comments: 7 pages

Journal ref: (IJCA)International Journal of Computer Applications, Volume 56, No.7, 2012, 10-16

arXiv:1208.5997 [pdf]

Phases vs. Levels using Decision Trees for Intrusion Detection Systems

Authors: Heba Ezzat Ibrahim, Sherif M. Badr, Mohamed A. Shaheen

Abstract: Security of computers and the networks that connect them is increasingly becoming of great significance. Intrusion detection system is one of the security defense tools for computer networks. This paper compares two different model Approaches for representing intrusion detection system by using decision tree techniques. These approaches are Phase-model approach and Level-model approach. Each model… ▽ More Security of computers and the networks that connect them is increasingly becoming of great significance. Intrusion detection system is one of the security defense tools for computer networks. This paper compares two different model Approaches for representing intrusion detection system by using decision tree techniques. These approaches are Phase-model approach and Level-model approach. Each model is implemented by using two techniques, New Attacks and Data partitioning techniques. The experimental results showed that Phase approach has higher classification rate in both New Attacks and Data Partitioning techniques than Level approach. △ Less

Submitted 29 August, 2012; originally announced August 2012.

Comments: 7 pages; (IJCSIS) International Journal of Computer Science and Information Security, Vol. 10, No. 8, 2012

Showing 1–5 of 5 results for author: Shaheen, M