Skip to main content

Showing 1–17 of 17 results for author: Alhafni, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03032  [pdf, other

    cs.CL

    Strategies for Arabic Readability Modeling

    Authors: Juan Piñeros Liberato, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash

    Abstract: Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability resources. In this paper, we present a set of experimental results on Arabic readability assessment using a diverse range of approaches, from rule-bas… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ArabicNLP 2024, ACL

  2. arXiv:2407.03020  [pdf, other

    cs.CL

    Exploiting Dialect Identification in Automatic Dialectal Text Normalization

    Authors: Bashar Alhafni, Sarah Al-Towaity, Ziyad Fawzy, Fatema Nassar, Fadhl Eryani, Houda Bouamor, Nizar Habash

    Abstract: Dialectal Arabic is the primary spoken language used by native Arabic speakers in daily communication. The rise of social media platforms has notably expanded its use as a written language. However, Arabic dialects do not have standard orthographies. This, combined with the inherent noise in user-generated content on social media, presents a major challenge to NLP applications dealing with Dialect… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ArabicNLP 2024, ACL

  3. arXiv:2404.18615  [pdf, other

    cs.CL

    The SAMER Arabic Text Simplification Corpus

    Authors: Bashar Alhafni, Reem Hazim, Juan Piñeros Liberato, Muhamed Al Khalil, Nizar Habash

    Abstract: We present the SAMER Corpus, the first manually annotated Arabic parallel corpus for text simplification targeting school-aged learners. Our corpus comprises texts of 159K words selected from 15 publicly available Arabic fiction novels most of which were published between 1865 and 1955. Our corpus includes readability level annotations at both the document and word levels, as well as two simplifie… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024. 15 pages, 6 tables, 1 figure

  4. arXiv:2402.16472  [pdf, other

    cs.CL cs.AI

    mEdIT: Multilingual Text Editing via Instruction Tuning

    Authors: Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar Alhafni, Dhruv Kumar

    Abstract: We introduce mEdIT, a multi-lingual extension to CoEdIT -- the recent state-of-the-art text editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual large, pre-trained language models (LLMs) via instruction tuning. They are designed to take instructions from the user specifying the attributes of the desired text in the form of natural language instructions, such… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to NAACL 2024 (Main). 23 pages, 8 tables, 11 figures

    ACM Class: I.2.7

  5. arXiv:2402.04914  [pdf, other

    cs.CL

    Personalized Text Generation with Fine-Grained Linguistic Control

    Authors: Bashar Alhafni, Vivek Kulkarni, Dhruv Kumar, Vipul Raheja

    Abstract: As the text generation capabilities of large language models become increasingly prominent, recent studies have focused on controlling particular aspects of the generated text to make it more personalized. However, most research on controllable text generation focuses on controlling the content or modeling specific high-level/coarse-grained attributes that reflect authors' writing styles, such as… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  6. arXiv:2305.14734  [pdf, other

    cs.CL

    Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

    Authors: Bashar Alhafni, Go Inoue, Christian Khairallah, Nizar Habash

    Abstract: Grammatical error correction (GEC) is a well-explored problem in English with many existing models and datasets. However, research on GEC in morphologically rich languages has been limited due to challenges such as data scarcity and language complexity. In this paper, we present the first results on Arabic GEC using two newly developed Transformer-based pretrained sequence-to-sequence models. We a… ▽ More

    Submitted 9 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023

  7. Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

    Authors: Hazem Ibrahim, Fengyuan Liu, Rohail Asim, Balaraju Battu, Sidahmed Benabderrahmane, Bashar Alhafni, Wifag Adnan, Tuka Alhanai, Bedoor AlShebli, Riyadh Baghdadi, Jocelyn J. Bélanger, Elena Beretta, Kemal Celik, Moumena Chaqfeh, Mohammed F. Daqaq, Zaynab El Bernoussi, Daryl Fougnie, Borja Garcia de Soto, Alberto Gandolfi, Andras Gyorgy, Nizar Habash, J. Andrew Harris, Aaron Kaufman, Lefteris Kirousis, Korhan Kocak , et al. (14 additional authors not shown)

    Abstract: The emergence of large language models has led to the development of powerful tools such as ChatGPT that can produce text indistinguishable from human-generated work. With the increasing accessibility of such technology, students across the globe may utilize it to help with their school work -- a possibility that has sparked discussions on the integrity of student evaluations in the age of artific… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: 17 pages, 4 figures

  8. arXiv:2210.14190  [pdf, other

    cs.CL

    CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

    Authors: Hossein Rajaby Faghihi, Bashar Alhafni, Ke Zhang, Shihao Ran, Joel Tetreault, Alejandro Jaimes

    Abstract: Social media has increasingly played a key role in emergency response: first responders can use public posts to better react to ongoing crisis events and deploy the necessary resources where they are most needed. Timeline extraction and abstractive summarization are critical technical tasks to leverage large numbers of social media posts about events. Unfortunately, there are few datasets for benc… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  9. arXiv:2210.12410  [pdf, other

    cs.CL

    The Shared Task on Gender Rewriting

    Authors: Bashar Alhafni, Nizar Habash, Houda Bouamor, Ossama Obeid, Sultan Alrowili, Daliyah Alzeer, Khawlah M. Alshanqiti, Ahmed ElBakry, Muhammad ElNokrashy, Mohamed Gabr, Abderrahmane Issam, Abdelrahim Qaddoumi, K. Vijay-Shanker, Mahmoud Zyate

    Abstract: In this paper, we present the results and findings of the Shared Task on Gender Rewriting, which was organized as part of the Seventh Arabic Natural Language Processing Workshop. The task of gender rewriting refers to generating alternatives of a given sentence to match different target user gender contexts (e.g., female speaker with a male listener, a male speaker with a male listener, etc.). Thi… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  10. arXiv:2210.10672  [pdf, other

    cs.CL

    Arabic Word-level Readability Visualization for Assisted Text Simplification

    Authors: Reem Hazim, Hind Saddiki, Bashar Alhafni, Muhamed Al Khalil, Nizar Habash

    Abstract: This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization. The add-on includes a lemmatization component that is connected to a five-level readability lexicon and Arabic WordNet-based substitution suggestions. The add-on can be used for assessing the reading difficulty of a text and identifying difficult words as part of the task of manual text simplif… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  11. arXiv:2210.07538  [pdf, other

    cs.CL

    The User-Aware Arabic Gender Rewriter

    Authors: Bashar Alhafni, Ossama Obeid, Nizar Habash

    Abstract: We introduce the User-Aware Arabic Gender Rewriter, a user-centric web-based system for Arabic gender rewriting in contexts involving two users. The system takes either Arabic or English sentences as input, and provides users with the ability to specify their desired first and/or second person target genders. The system outputs gender rewritten alternatives of the Arabic input sentences (or their… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  12. arXiv:2207.02356  [pdf, other

    cs.CL

    Zero-shot Cross-Linguistic Learning of Event Semantics

    Authors: Malihe Alikhani, Thomas Kober, Bashar Alhafni, Yue Chen, Mert Inan, Elizabeth Nielsen, Shahab Raji, Mark Steedman, Matthew Stone

    Abstract: Typologically diverse languages offer systems of lexical and grammatical aspect that allow speakers to focus on facets of event structure in ways that comport with the specific communicative setting and discourse constraints they face. In this paper, we look specifically at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish and describe a computational model for predict… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted at INLG 2022

  13. arXiv:2205.02211  [pdf, other

    cs.CL

    User-Centric Gender Rewriting

    Authors: Bashar Alhafni, Nizar Habash, Houda Bouamor

    Abstract: In this paper, we define the task of gender rewriting in contexts involving two users (I and/or You) - first and second grammatical persons with independent grammatical gender preferences. We focus on Arabic, a gender-marking morphologically rich language. We develop a multi-step system that combines the positive aspects of both rule-based and neural rewriting models. Our results successfully demo… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted at NAACL 2022

  14. arXiv:2110.09216  [pdf, other

    cs.CL

    The Arabic Parallel Gender Corpus 2.0: Extensions and Analyses

    Authors: Bashar Alhafni, Nizar Habash, Houda Bouamor

    Abstract: Gender bias in natural language processing (NLP) applications, particularly machine translation, has been receiving increasing attention. Much of the research on this issue has focused on mitigating gender bias in English NLP models and systems. Addressing the problem in poorly resourced, and/or morphologically rich languages has lagged behind, largely due to the lack of datasets and resources. In… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  15. arXiv:2103.06678  [pdf, other

    cs.CL

    The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

    Authors: Go Inoue, Bashar Alhafni, Nurpeiis Baimukan, Houda Bouamor, Nizar Habash

    Abstract: In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the imp… ▽ More

    Submitted 4 September, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: Accepted to WANLP 2021

  16. arXiv:1904.11942  [pdf, other

    cs.CL

    Contextualized Word Embeddings Enhanced Event Temporal Relation Extraction for Story Understanding

    Authors: Rujun Han, Mengyue Liang, Bashar Alhafni, Nanyun Peng

    Abstract: Learning causal and temporal relationships between events is an important step towards deeper story and commonsense understanding. Though there are abundant datasets annotated with event relations for story comprehension, many have no empirical results associated with them. In this work, we establish strong baselines for event temporal relation extraction on two under-explored story narrative data… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

  17. arXiv:1901.00211  [pdf, other

    cs.CV

    Map** Areas using Computer Vision Algorithms and Drones

    Authors: Bashar Alhafni, Saulo Fernando Guedes, Lays Cavalcante Ribeiro, Juhyun Park, Jeongkyu Lee

    Abstract: The goal of this paper is to implement a system, titled as Drone Map Creator (DMC) using Computer Vision techniques. DMC can process visual information from an HD camera in a drone and automatically create a map by stitching together visual information captured by a drone. The proposed approach employs the Speeded up robust features (SURF) method to detect the key points for each image frame; then… ▽ More

    Submitted 1 January, 2019; originally announced January 2019.

    Comments: 7 pages, 12 figures. This work was presented at the American Society for Engineering Education (ASEE) Northeast Conference in 2016