Skip to main content

Showing 1–19 of 19 results for author: Zaghouani, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05559  [pdf, other

    cs.CL cs.AI

    ThatiAR: Subjectivity Detection in Arabic News Sentences

    Authors: Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani, Firoj Alam

    Abstract: Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and ot… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Subjectivity, Sentiment, Disinformation, Misinformation, Fake news, LLMs, Transformers, Instruction Dataset

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  2. arXiv:2403.18314  [pdf, other

    cs.CL cs.AI

    Chinese Offensive Language Detection:Current Status and Future Directions

    Authors: Yunze Xiao, Houda Bouamor, Wajdi Zaghouani

    Abstract: Despite the considerable efforts being made to monitor and regulate user-generated content on social media platforms, the pervasiveness of offensive language, such as hate speech or cyberbullying, in the digital space remains a significant challenge. Given the importance of maintaining a civilized and respectful online environment, there is an urgent and growing need for automatic systems capable… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  3. arXiv:2401.00127  [pdf, other

    cs.CV cs.SI

    Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models

    Authors: Ashhadul Islam, Md. Rafiul Biswas, Wajdi Zaghouani, Samir Brahim Belhaouari, Zubair Shah

    Abstract: $ $The synergy of language and vision models has given rise to Large Language and Vision Assistant models (LLVAs), designed to engage users in rich conversational experiences intertwined with image-based queries. These comprehensive multimodal models seamlessly integrate vision encoders with Large Language Models (LLMs), expanding their applications in general-purpose language and visual comprehen… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 5 pages,6 figures, 4 tables, Accepted on The International Symposium on Foundation and Large Language Models (FLLM2023)

    Journal ref: https://fllm-conference.org/2023/

  4. arXiv:2312.12016  [pdf

    cs.SI

    Potentials of ChatGPT for Annotating Vaccine Related Tweets

    Authors: Md. Rafiul Biswas, Farida Mohsen, Zubair Shah, Wajdi Zaghouani

    Abstract: This study evaluates ChatGPT's performance in annotating vaccine-related Arabic tweets by comparing its annotations with human annotations. A dataset of 2,100 tweets representing various factors contributing to vaccine hesitancy was examined. Two domain experts annotated the data, with a third resolving conflicts. ChatGPT was then employed to annotate the same dataset using specific prompts for ea… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 6 pages, 5 figures, two tables, accepted on The International Symposium on Foundation and Large Language Models (FLLM2023)

    Journal ref: https://fllm-conference.org/2023/

  5. arXiv:2312.12006  [pdf

    cs.CL cs.SI

    Can ChatGPT be Your Personal Medical Assistant?

    Authors: Md. Rafiul Biswas, Ashhadul Islam, Zubair Shah, Wajdi Zaghouani, Samir Brahim Belhaouari

    Abstract: The advanced large language model (LLM) ChatGPT has shown its potential in different domains and remains unbeaten due to its characteristics compared to other LLMs. This study aims to evaluate the potential of using a fine-tuned ChatGPT model as a personal medical assistant in the Arabic language. To do so, this study uses publicly available online questions and answering datasets in Arabic langua… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 5 pages, 7 figures, two tables, Accepted on The International Symposium on Foundation and Large Language Models (FLLM2023)

    Journal ref: The International Symposium on Foundation and Large Language Models (FLLM2023) https://fllm-conference.org/2023/

  6. arXiv:2311.03179  [pdf, other

    cs.CL cs.AI

    ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection in Arabic Text

    Authors: Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat

    Abstract: We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023. ArAIEval offers two tasks over Arabic text: (i) persuasion technique detection, focusing on identifying persuasion techniques in tweets and news articles, and (ii) disinformation detection in binary and multiclass setups over tweets. A total of 20 teams participa… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ArabicNLP-23 (EMNLP-23), propaganda, disinformation, misinformation, fake news

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  7. arXiv:2303.09823  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages

    Authors: Angel Felipe Magnossão de Paula, Imene Bensalem, Paolo Rosso, Wajdi Zaghouani

    Abstract: This paper describes our participation in the shared task of hate speech detection, which is one of the subtasks of the CERIST NLP Challenge 2022. Our experiments evaluate the performance of six transformer models and their combination using 2 ensemble approaches. The best results on the training set, in a five-fold cross validation scenario, were obtained by using the ensemble approach based on t… ▽ More

    Submitted 17 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 tables

  8. arXiv:2211.10057  [pdf, other

    cs.CL cs.AI cs.LG

    Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic

    Authors: Firoj Alam, Hamdy Mubarak, Wajdi Zaghouani, Giovanni Da San Martino, Preslav Nakov

    Abstract: Propaganda is the expression of an opinion or an action by an individual or a group deliberately designed to influence the opinions or the actions of other individuals or groups with reference to predetermined ends, which is achieved by means of well-defined rhetorical and psychological devices. Propaganda techniques are commonly used in social media to manipulate or to mislead users. Thus, there… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted at WANLP-22 (EMNLP-22), propaganda, disinformation, misinformation, fake news, memes, multimodality. arXiv admin note: text overlap with arXiv:2109.08013, arXiv:2105.09284

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  9. arXiv:2205.06025  [pdf, other

    cs.CL

    DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

    Authors: Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov

    Abstract: The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. However, the research in MRC has been understudied in several domains, including religious texts. The goal of the Qur'an QA 20… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted to OSACT5 Co-located with LREC 2022

  10. arXiv:2109.12986  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    Findings of the NLP4IF-2021 Shared Tasks on Fighting the COVID-19 Infodemic and Censorship Detection

    Authors: Shaden Shaar, Firoj Alam, Giovanni Da San Martino, Alex Nikolov, Wajdi Zaghouani, Preslav Nakov, Anna Feldman

    Abstract: We present the results and the main findings of the NLP4IF-2021 shared tasks. Task 1 focused on fighting the COVID-19 infodemic in social media, and it was offered in Arabic, Bulgarian, and English. Given a tweet, it asked to predict whether that tweet contains a verifiable claim, and if so, whether it is likely to be false, is of general interest, is likely to be harmful, and is worthy of manual… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: COVID-19, infodemic, harmfulness, check-worthiness, censorship, social media, tweets, Arabic, Bulgarian, English, Chinese

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: NLP4IF-2021

  11. arXiv:2007.09655  [pdf, other

    cs.SI cs.CL

    Political Framing: US COVID19 Blame Game

    Authors: Chereen Shurafa, Kareem Darwish, Wajdi Zaghouani

    Abstract: Through the use of Twitter, framing has become a prominent presidential campaign tool for politically active users. Framing is used to influence thoughts by evoking a particular perspective on an event. In this paper, we show that the COVID19 pandemic rather than being viewed as a public health issue, political rhetoric surrounding it is mostly shaped through a blame frame (blame Trump, China, or… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: Social Informatics 2020 (SocInfo2020)

  12. arXiv:2005.00033  [pdf, other

    cs.CL cs.CY cs.IR

    Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society

    Authors: Firoj Alam, Shaden Shaar, Fahim Dalvi, Hassan Sajjad, Alex Nikolov, Hamdy Mubarak, Giovanni Da San Martino, Ahmed Abdelali, Nadir Durrani, Kareem Darwish, Abdulaziz Al-Homaid, Wajdi Zaghouani, Tommaso Caselli, Gijs Danoe, Friso Stolk, Britt Bruntink, Preslav Nakov

    Abstract: With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the most important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreadin… ▽ More

    Submitted 22 September, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: disinformation, misinformation, factuality, fact-checking, fact-checkers, check-worthiness, Social Media Platforms, COVID-19, social media

    MSC Class: 68T50 ACM Class: I.2; I.2.7

    Journal ref: EMNLP-2021 (Findings)

  13. arXiv:1808.08392  [pdf, other

    cs.CL

    MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction

    Authors: Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor, Wajdi Zaghouani, Kemal Oflazer

    Abstract: In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal Arabic. The MADARi framework provides intuitive interfaces for annotating text and managing the annotation process of a large number of sizable documents. Morphological annotation includes indicating, for a word, in context, its baseword, clitics, part-of-speech,… ▽ More

    Submitted 25 August, 2018; originally announced August 2018.

    Comments: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

  14. arXiv:1808.07678  [pdf

    cs.CL

    Guidelines and Annotation Framework for Arabic Author Profiling

    Authors: Wajdi Zaghouani, Anis Charfi

    Abstract: In this paper, we present the annotation pipeline and the guidelines we wrote as part of an effort to create a large manually annotated Arabic author profiling dataset from various social media sources covering 16 Arabic countries and 11 dialectal regions. The target size of the annotated ARAP-Tweet corpus is more than 2.4 million words. We illustrate and summarize our general and dialect-specific… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

    Comments: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools: with ArabicWeb16 Data Challenge. arXiv admin note: text overlap with arXiv:1808.07674

  15. arXiv:1808.07674  [pdf

    cs.CL

    Arap-Tweet: A Large Multi-Dialect Twitter Corpus for Gender, Age and Language Variety Identification

    Authors: Wajdi Zaghouani, Anis Charfi

    Abstract: In this paper, we present Arap-Tweet, which is a large-scale and multi-dialectal corpus of Tweets from 11 regions and 16 countries in the Arab world representing the major Arabic dialectal varieties. To build this corpus, we collected data from Twitter and we provided a team of experienced annotators with annotation guidelines that they used to annotate the corpus for age categories, gender, and d… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

    Comments: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

  16. arXiv:1808.05542  [pdf, other

    cs.CL

    Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness

    Authors: Pepa Atanasova, Alberto Barron-Cedeno, Tamer Elsayed, Reem Suwaileh, Wajdi Zaghouani, Spas Kyuchukov, Giovanni Da San Martino, Preslav Nakov

    Abstract: We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Comments: Computational journalism, Check-worthiness, Fact-checking, Veracity

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: CLEF-2018

  17. arXiv:1702.07835  [pdf

    cs.CL

    Critical Survey of the Freely Available Arabic Corpora

    Authors: Wajdi Zaghouani

    Abstract: The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research community especially for language such as Arabic. Currently, there is not easy was to access to a comprehen… ▽ More

    Submitted 25 February, 2017; originally announced February 2017.

    Comments: Published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), OSACT Workshop. Reykjavik, Iceland, 26-31 May 2014

  18. arXiv:cs/0609065  [pdf

    cs.CL cs.IR

    Geocoding multilingual texts: Recognition, disambiguation and visualisation

    Authors: Bruno Pouliquen, Marco Kimler, Ralf Steinberger, Camelia Ignat, Tamara Oellinger, Ken Blackler, Flavio Fuart, Wajdi Zaghouani, Anna Widiger, Ann-Charlotte Forslund, Clive Best

    Abstract: We are presenting a method to recognise geographical references in free text. Our tool must work on various languages with a minimum of language-dependent resources, except a gazetteer. The main difficulty is to disambiguate these place names by distinguishing places from persons and by selecting the most likely place out of a list of homographic place names world-wide. The system uses a number… ▽ More

    Submitted 12 September, 2006; originally announced September 2006.

    Comments: 6 pages

    ACM Class: H.3.1; H.3.3; H.3.4

    Journal ref: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pp. 53-58. Genoa, Italy, 24-26 May 2006

  19. arXiv:cs/0609051  [pdf

    cs.CL cs.IR

    Multilingual person name recognition and transliteration

    Authors: Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Irina Temnikova, Anna Widiger, Wajdi Zaghouani, Jan Zizka

    Abstract: We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the co-occurrence of their names in related news. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writin… ▽ More

    Submitted 11 September, 2006; originally announced September 2006.

    Comments: Explains the technology behind the JRC's NewsExplorer application, which is freely accessible at http://press.jrc.it/NewsExplorer

    ACM Class: H.3.1; H.3.3; H.3.4; H.3.5

    Journal ref: Journal CORELA - Cognition, Representation, Langage. Numeros speciaux, Le traitement lexicographique des noms propres. December 2005. ISSN 1638-5748