-
A Survey of Plagiarism Detection Systems: Case of Use with English, French and Arabic Languages
Authors:
Mehdi Abdelhamid,
Faical Azouaou,
Sofiane Batata
Abstract:
In academia, plagiarism is certainly not an emerging concern, but it became of a greater magnitude with the popularisation of the Internet and the ease of access to a worldwide source of content, rendering human-only intervention insufficient. Despite that, plagiarism is far from being an unaddressed problem, as computer-assisted plagiarism detection is currently an active area of research that fa…
▽ More
In academia, plagiarism is certainly not an emerging concern, but it became of a greater magnitude with the popularisation of the Internet and the ease of access to a worldwide source of content, rendering human-only intervention insufficient. Despite that, plagiarism is far from being an unaddressed problem, as computer-assisted plagiarism detection is currently an active area of research that falls within the field of Information Retrieval (IR) and Natural Language Processing (NLP). Many software solutions emerged to help fulfil this task, and this paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings. The comparison was held between eight systems and was performed with respect to their features, usability, technical aspects, as well as their performance in detecting three levels of obfuscation from different sources: verbatim, paraphrase, and cross-language plagiarism. An indepth examination of technical forms of plagiarism was also performed in the context of this study. In addition, a survey of plagiarism typologies and classifications proposed by different authors is provided.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Sexism detection: The first corpus in Algerian dialect with a code-switching in Arabic/ French and English
Authors:
Imane Guellil,
Ahsan Adeel,
Faical Azouaou,
Mohamed Boubred,
Yousra Houichi,
Akram Abdelhaq Moumna
Abstract:
In this paper, an approach for hate speech detection against women in Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic\_fr\_en) is developed using three different annota…
▽ More
In this paper, an approach for hate speech detection against women in Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic\_fr\_en) is developed using three different annotators. For corpus validation, three different machine learning algorithms are used, including deep Convolutional Neural Network (CNN), long short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network. Simulation results demonstrate the best performance of the CNN model, which achieved F1-score up to 86\% for the unbalanced corpus as compared to LSTM and Bi-LSTM.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
Detecting Stable Communities in Link Streams at Multiple Temporal Scales
Authors:
Souaad Boudebza,
Remy Cazabet,
Omar Nouali,
Faical Azouaou
Abstract:
Link streams model interactions over time in a wide range of fields. Under this model, the challenge is to mine efficiently both temporal and topological structures. Community detection and change point detection are one of the most powerful tools to analyze such evolving interactions. In this paper, we build on both to detect stable community structures by identifying change points within meaning…
▽ More
Link streams model interactions over time in a wide range of fields. Under this model, the challenge is to mine efficiently both temporal and topological structures. Community detection and change point detection are one of the most powerful tools to analyze such evolving interactions. In this paper, we build on both to detect stable community structures by identifying change points within meaningful communities. Unlike existing dynamic community detection algorithms, the proposed method is able to discover stable communities efficiently at multiple temporal scales. We test the effectiveness of our method on synthetic networks, and on high-resolution time-varying networks of contacts drawn from real social networks.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Arabic natural language processing: An overview
Authors:
Imane Guellil,
Houda Saâdane,
Faical Azouaou,
Billel Gueni,
Damien Nouvel
Abstract:
Arabic is recognised as the 4th most used language of the Internet. Arabic has three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic (MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic or in Roman script (Arabizi), which corresponds to Arabic written with Latin letters, numerals and punctuation. Due to the complexity of this language and the number of…
▽ More
Arabic is recognised as the 4th most used language of the Internet. Arabic has three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic (MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic or in Roman script (Arabizi), which corresponds to Arabic written with Latin letters, numerals and punctuation. Due to the complexity of this language and the number of corresponding challenges for NLP, many surveys have been conducted, in order to synthesise the work done on Arabic. However these surveys principally focus on two varieties of Arabic (MSA and AD, written in Arabic letters only), they are slightly old (no such survey since 2015) and therefore do not cover recent resources and tools. To bridge the gap, we propose a survey focusing on 90 recent research papers (74% of which were published after 2015). Our study presents and classifies the work done on the three varieties of Arabic, by concentrating on both Arabic and Arabizi, and associates each work to its publicly available resources whenever available.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis
Authors:
Imane Guellil,
Ahsan Adeel,
Faical Azouaou,
Amir Hussain
Abstract:
Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construc…
▽ More
Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (a Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Hybrid approach for transliteration of Algerian arabizi: a primary study
Authors:
Imane Guellil,
Faical Azouaou,
Fodil Benali,
Ala-Eddine Hachani,
Houda Saadane
Abstract:
A hybrid approach for the transliteration of Algerian Arabizi: A primary study In this paper, we present a hybrid approach for the transliteration of the Algerian Arabizi. We define a set of rules enable us the passage from Arabizi to Arabic. Through these rules, we generate a set of candidates for the transliteration of each Arabizi word into arabic. Then, we extract the best candidate. This appr…
▽ More
A hybrid approach for the transliteration of Algerian Arabizi: A primary study In this paper, we present a hybrid approach for the transliteration of the Algerian Arabizi. We define a set of rules enable us the passage from Arabizi to Arabic. Through these rules, we generate a set of candidates for the transliteration of each Arabizi word into arabic. Then, we extract the best candidate. This approach was evaluated by using three test corpora, and the obtained results show an improvement of the precision score which is equal to 75.11% for the best result. These results allow us to verify that our approach is very competitive comparing to others works that treat Arabizi transliteration in general.
Keywords: Arabizi, Dialecte Algérien, Arabizi Algérien, Translitération.
△ Less
Submitted 10 August, 2018;
originally announced August 2018.
-
OLCPM: An Online Framework for Detecting Overlap** Communities in Dynamic Social Networks
Authors:
Souâad Boudebza,
Rémy Cazabet,
Faiçal Azouaou,
Omar Nouali
Abstract:
Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present a…
▽ More
Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present an online algorithm, called OLCPM, based on clique percolation and label propagation methods. OLCPM can detect overlap** communities and works on temporal networks with a fine granularity. By locally updating the community structure, OLCPM delivers significant improvement in running time compared with previous clique percolation techniques. The experimental results on both synthetic and real-world networks illustrate the effectiveness of the method.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
ASDA : Analyseur Syntaxique du Dialecte Alg{é}rien dans un but d'analyse s{é}mantique
Authors:
Imène Guellil,
Faiçal Azouaou
Abstract:
Opinion mining and sentiment analysis in social media is a research issue having a great interest in the scientific community. However, before begin this analysis, we are faced with a set of problems. In particular, the problem of the richness of languages and dialects within these media. To address this problem, we propose in this paper an approach of construction and implementation of Syntactic…
▽ More
Opinion mining and sentiment analysis in social media is a research issue having a great interest in the scientific community. However, before begin this analysis, we are faced with a set of problems. In particular, the problem of the richness of languages and dialects within these media. To address this problem, we propose in this paper an approach of construction and implementation of Syntactic analyzer named ASDA. This tool represents a parser for the Algerian dialect that label the terms of a given corpus. Thus, we construct a labeling table containing for each term its stem, different prefixes and suffixes, allowing us to determine the different grammatical parts a sort of POS tagging. This labeling will serve us later in the semantic processing of the Algerian dialect, like the automatic translation of this dialect or sentiment analysis
△ Less
Submitted 26 July, 2017;
originally announced July 2017.