Skip to main content

Showing 1–6 of 6 results for author: Guellil, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14312  [pdf, other

    cs.CL cs.AI

    Infusing clinical knowledge into tokenisers for language models

    Authors: Abul Hasan, **ge Wu, Quang Ngoc Nguyen, Salomé Andres, Imane Guellil, Huayu Zhang, Arlene Casey, Beatrice Alex, Bruce Guthrie, Honghan Wu

    Abstract: This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  2. arXiv:2104.01443  [pdf, ps, other

    cs.CL

    Sexism detection: The first corpus in Algerian dialect with a code-switching in Arabic/ French and English

    Authors: Imane Guellil, Ahsan Adeel, Faical Azouaou, Mohamed Boubred, Yousra Houichi, Akram Abdelhaq Moumna

    Abstract: In this paper, an approach for hate speech detection against women in Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic\_fr\_en) is developed using three different annota… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: This paper was accepted at the AfricanNLP workshop (EACL conference 2021)

  3. Arabic natural language processing: An overview

    Authors: Imane Guellil, Houda Saâdane, Faical Azouaou, Billel Gueni, Damien Nouvel

    Abstract: Arabic is recognised as the 4th most used language of the Internet. Arabic has three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic (MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic or in Roman script (Arabizi), which corresponds to Arabic written with Latin letters, numerals and punctuation. Due to the complexity of this language and the number of… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  4. arXiv:1808.05079  [pdf, other

    cs.CL

    SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

    Authors: Imane Guellil, Ahsan Adeel, Faical Azouaou, Amir Hussain

    Abstract: Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construc… ▽ More

    Submitted 15 August, 2018; originally announced August 2018.

    Comments: To appear in the 9th International Conference on Brain Inspired Cognitive Systems (BICS 2018)

    ACM Class: I.5.0; I.2.7

  5. arXiv:1808.03437  [pdf

    cs.CL

    Hybrid approach for transliteration of Algerian arabizi: a primary study

    Authors: Imane Guellil, Faical Azouaou, Fodil Benali, Ala-Eddine Hachani, Houda Saadane

    Abstract: A hybrid approach for the transliteration of Algerian Arabizi: A primary study In this paper, we present a hybrid approach for the transliteration of the Algerian Arabizi. We define a set of rules enable us the passage from Arabizi to Arabic. Through these rules, we generate a set of candidates for the transliteration of each Arabizi word into arabic. Then, we extract the best candidate. This appr… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: in French

    Journal ref: 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN), May 2018, Rennes, France

  6. arXiv:1707.08998  [pdf

    cs.CL

    ASDA : Analyseur Syntaxique du Dialecte Alg{é}rien dans un but d'analyse s{é}mantique

    Authors: Imène Guellil, Faiçal Azouaou

    Abstract: Opinion mining and sentiment analysis in social media is a research issue having a great interest in the scientific community. However, before begin this analysis, we are faced with a set of problems. In particular, the problem of the richness of languages and dialects within these media. To address this problem, we propose in this paper an approach of construction and implementation of Syntactic… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.