Skip to main content

Showing 1–28 of 28 results for author: Abdalla, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18757  [pdf, other

    cs.ET cs.AI cs.LG

    The Impact of Feature Representation on the Accuracy of Photonic Neural Networks

    Authors: Mauricio Gomes de Queiroz, Paul Jimenez, Raphael Cardoso, Mateus Vidaletti Costa, Mohab Abdalla, Ian O'Connor, Alberto Bosio, Fabio Pavanello

    Abstract: Photonic Neural Networks (PNNs) are gaining significant interest in the research community due to their potential for high parallelization, low latency, and energy efficiency. PNNs compute using light, which leads to several differences in implementation when compared to electronics, such as the need to represent input features in the photonic domain before feeding them into the network. In this e… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.04493  [pdf, other

    cs.CV cs.CL

    CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset

    Authors: Abdelrahman Abdallah, Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Ibrahim Abdelhalim, Mohamed Elkasaby, Yasser ElBendary, Adam Jatowt

    Abstract: In the fields of Optical Character Recognition (OCR) and Natural Language Processing (NLP), integrating multilingual capabilities remains a critical challenge, especially when considering languages with complex scripts such as Arabic. This paper introduces the Comprehensive Post-OCR Parsing and Receipt Understanding Dataset (CORU), a novel dataset specifically designed to enhance OCR and informati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2405.19387  [pdf, other

    cs.CV

    Video Anomaly Detection in 10 Years: A Survey and Outlook

    Authors: Moshira Abdalla, Sajid Javed, Muaz Al Radi, Anwaar Ulhaq, Naoufel Werghi

    Abstract: Video anomaly detection (VAD) holds immense importance across diverse domains such as surveillance, healthcare, and environmental monitoring. While numerous surveys focus on conventional VAD methods, they often lack depth in exploring specific approaches and emerging trends. This survey explores deep learning-based VAD, expanding beyond traditional supervised training paradigms to encompass emergi… ▽ More

    Submitted 30 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2403.18933  [pdf, other

    cs.CL

    SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

    Abstract: We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. The… ▽ More

    Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638

  5. arXiv:2403.17848  [pdf, other

    cs.CL cs.IR

    ArabicaQA: A Comprehensive Dataset for Arabic Question Answering

    Authors: Abdelrahman Abdallah, Mahmoud Kasem, Mahmoud Abdalla, Mohamed Mahmoud, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt

    Abstract: In this paper, we address the significant gap in Arabic natural language processing (NLP) resources by introducing ArabicaQA, the first large-scale dataset for machine reading comprehension and open-domain question answering in Arabic. This comprehensive dataset, consisting of 89,095 answerable and 3,701 unanswerable questions created by crowdworkers to look similar to answerable ones, along with… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at SIGIR 2024

  6. arXiv:2402.17488  [pdf, other

    cs.CR

    Complexity Assessment of Analog Security Primitives Using the Disentropy of Autocorrelation

    Authors: Paul Jimenez, Raphael Cardoso, Maurìcio Gomes de Queiroz, Mohab Abdalla, Cédric Marchand, Xavier Letartre, Fabio Pavanello

    Abstract: The study of regularity in signals can be of great importance, typically in medicine to analyse electrocardiogram (ECG) or electromyography (EMG) signals, but also in climate studies, finance or security. In this work we focus on security primitives such as Physical Unclonable Functions (PUFs) or Pseudo-Random Number Generators (PRNGs). Such primitives must have a high level of complexity or entro… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 11 pages, 8 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  7. arXiv:2402.12046  [pdf, other

    cs.DL cs.CL

    Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession

    Authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

    Abstract: This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, revea… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  8. arXiv:2402.09154  [pdf, other

    cs.LG

    Attacking Large Language Models with Projected Gradient Descent

    Authors: Simon Geisler, Tom Wollschläger, M. H. I. Abdalla, Johannes Gasteiger, Stephan Günnemann

    Abstract: Current LLM alignment methods are readily broken through specifically crafted adversarial prompts. While crafting adversarial prompts using discrete optimization is highly effective, such attacks typically use more than 100,000 LLM calls. This high computational cost makes them unsuitable for, e.g., quantitative analyses and adversarial training. To remedy this, we revisit Projected Gradient Desce… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  9. arXiv:2402.08638  [pdf, other

    cs.CL

    SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata , et al. (2 additional authors not shown)

    Abstract: Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dat… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to the Findings of ACL 2024

  10. arXiv:2312.03912  [pdf, other

    cs.CL

    Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions

    Authors: Will Aitken, Mohamed Abdalla, Karen Rudie, Catherine Stinson

    Abstract: Impressive performance of pre-trained models has garnered public attention and made news headlines in recent years. Almost always, these models are produced by or in collaboration with industry. Using them is critical for competing on natural language processing (NLP) benchmarks and correspondingly to stay relevant in NLP research. We surveyed 100 papers published at EMNLP 2022 to determine the de… ▽ More

    Submitted 22 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: ACL 2024 Main Conference

  11. arXiv:2311.12560  [pdf

    cs.CV

    Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors

    Authors: Carolina A. M. Heming, Mohamed Abdalla, Monish Ahluwalia, Linglin Zhang, Hari Trivedi, MinJae Woo, Benjamin Fine, Judy Wawira Gichoya, Leo Anthony Celi, Laleh Seyyed-Kalantari

    Abstract: Clinical AI model reporting cards should be expanded to incorporate a broad bias reporting of both social and non-social factors. Non-social factors consider the role of other factors, such as disease dependent, anatomic, or instrument factors on AI model bias, which are essential to ensure safe deployment.

    Submitted 21 November, 2023; originally announced November 2023.

  12. We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

    Authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

    Abstract: Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Published at EMNLP 2023

    Journal ref: EMNLP 2023

  13. arXiv:2310.12155  [pdf

    cs.NE

    Balancing exploration and exploitation phases in whale optimization algorithm: an insightful and empirical analysis

    Authors: Aram M. Ahmed, Tarik A. Rashid, Bryar A. Hassan, Jaffer Majidpour, Kaniaw A. Noori, Chnoor Maheadeen Rahman, Mohmad Hussein Abdalla, Shko M. Qader, Noor Tayfor, Naufel B Mohammed

    Abstract: Agents of any metaheuristic algorithms are moving in two modes, namely exploration and exploitation. Obtaining robust results in any algorithm is strongly dependent on how to balance between these two modes. Whale optimization algorithm as a robust and well recognized metaheuristic algorithm in the literature, has proposed a novel scheme to achieve this balance. It has also shown superior results… ▽ More

    Submitted 3 September, 2023; originally announced October 2023.

    Comments: 11 pages

  14. arXiv:2310.07723  [pdf

    cs.NE

    Equitable and Fair Performance Evaluation of Whale Optimization Algorithm

    Authors: Bryar A. Hassan, Tarik A. Rashid, Aram Ahmed, Shko M. Qader, Jaffer Majidpour, Mohmad Hussein Abdalla, Noor Tayfor, Hozan K. Hamarashid, Haval Sidqi, Kaniaw A. Noori

    Abstract: It is essential that all algorithms are exhaustively, somewhat, and intelligently evaluated. Nonetheless, evaluating the effectiveness of optimization algorithms equitably and fairly is not an easy process for various reasons. Choosing and initializing essential parameters, such as the size issues of the search area for each method and the number of iterations required to reduce the issues, might… ▽ More

    Submitted 4 September, 2023; originally announced October 2023.

    Comments: 21 pages

    Journal ref: 2023

  15. arXiv:2309.09800  [pdf, other

    cs.CL

    AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification

    Authors: Abdelrahman Abdallah, Mahmoud Abdalla, Mohamed Elkasaby, Yasser Elbendary, Adam Jatowt

    Abstract: The extraction of key information from receipts is a complex task that involves the recognition and extraction of text from scanned receipts. This process is crucial as it enables the retrieval of essential content and organizing it into structured documents for easy access and analysis. In this paper, we present AMuRD, a novel multilingual human-annotated dataset specifically designed for informa… ▽ More

    Submitted 26 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

  16. The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

    Authors: Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort

    Abstract: Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023

    Journal ref: ACL 2023

  17. arXiv:2211.08469  [pdf, other

    cs.CV

    Deep learning for table detection and structure recognition: A survey

    Authors: Mahmoud Kasem, Abdelrahman Abdallah, Alexander Berendeyev, Ebrahem Elkady, Mahmoud Abdalla, Mohamed Mahmoud, Mohamed Hamada, Daniyar Nurseitov, Islam Taj-Eddin

    Abstract: Tables are everywhere, from scientific journals, papers, websites, and newspapers all the way to items we buy at the supermarket. Detecting them is thus of utmost importance to automatically understanding the content of a document. The performance of table detection has substantially increased thanks to the rapid development of deep learning networks. The goals of this survey are to provide a prof… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  18. arXiv:2207.06245   

    eess.SP cs.NE

    Hitless memory-reconfigurable photonic reservoir computing architecture

    Authors: Mohab Abdalla, Clément Zrounba, Raphael Cardoso, Paul Jimenez, Guanghui Ren, Andreas Boes, Arnan Mitchell, Alberto Bosio, Ian O'Connor, Fabio Pavanello

    Abstract: Reservoir computing is an analog bio-inspired computation model for efficiently processing time-dependent signals, the photonic implementations of which promise a combination of massive parallel information processing, low power consumption, and high speed operation. However, most implementations, especially for the case of time-delay reservoir computing (TDRC), require signal attenuation in the r… ▽ More

    Submitted 17 May, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: The paper has been withdrawn by the authors due to their belief that the arguments and results presented in the paper are not mature enough, and includes a slight error

  19. arXiv:2110.04845  [pdf, other

    cs.CL

    What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

    Authors: Mohamed Abdalla, Krishnapriya Vishnubhotla, Saif M. Mohammad

    Abstract: The degree of semantic relatedness of two units of language has long been considered fundamental to understanding meaning. Additionally, automatically determining relatedness has many applications such as question answering and summarization. However, prior NLP work has largely focused on semantic similarity, a subset of relatedness, because of a lack of relatedness datasets. In this paper, we int… ▽ More

    Submitted 20 March, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted to EACL 2023; Our dataset, data statement, and annotation questionnaire can be found at: https://doi.org/10.5281/zenodo.7599667

  20. arXiv:2010.00153  [pdf, other

    cs.CL

    Examining the rhetorical capacities of neural language models

    Authors: Zining Zhu, Chuer Pan, Mohamed Abdalla, Frank Rudzicz

    Abstract: Recently, neural language models (LMs) have demonstrated impressive abilities in generating high-quality discourse. While many recent papers have analyzed the syntactic aspects encoded in LMs, there has been no analysis to date of the inter-sentential, rhetorical knowledge. In this paper, we propose a method that quantitatively evaluates the rhetorical capacities of neural LMs. We examine the capa… ▽ More

    Submitted 4 October, 2020; v1 submitted 30 September, 2020; originally announced October 2020.

    Comments: EMNLP 2020 BlackboxNLP Workshop

  21. The Grey Hoodie Project: Big Tobacco, Big Tech, and the threat on academic integrity

    Authors: Mohamed Abdalla, Moustafa Abdalla

    Abstract: As governmental bodies rely on academics' expert advice to shape policy regarding Artificial Intelligence, it is important that these academics not have conflicts of interests that may cloud or bias their judgement. Our work explores how Big Tech can actively distort the academic landscape to suit its needs. By comparing the well-studied actions of another industry (Big Tobacco) to the current act… ▽ More

    Submitted 27 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Accepted to AIES-21

  22. arXiv:2003.11515  [pdf, other

    cs.CL cs.CY cs.LG stat.ML

    Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings

    Authors: Haoran Zhang, Amy X. Lu, Mohamed Abdalla, Matthew McDermott, Marzyeh Ghassemi

    Abstract: In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

    Comments: Accepted at ACM CHIL 2020 (Spotlight)

  23. arXiv:1707.01626  [pdf, other

    cs.CL

    Cross-Lingual Sentiment Analysis Without (Good) Translation

    Authors: Mohamed Abdalla, Graeme Hirst

    Abstract: Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships between words in a cross-lingual setting. We appl… ▽ More

    Submitted 24 October, 2017; v1 submitted 5 July, 2017; originally announced July 2017.

    Comments: 10 pages, 4 figures

  24. arXiv:1707.00800  [pdf

    cs.CV cs.IR

    Arabic Character Segmentation Using Projection Based Approach with Profile's Amplitude Filter

    Authors: Mahmoud A. A. Mousa, Mohammed S. Sayed, Mahmoud I. Abdalla

    Abstract: Arabic is one of the languages that present special challenges to Optical character recognition (OCR). The main challenge in Arabic is that it is mostly cursive. Therefore, a segmentation process must be carried out to determine where the character begins and where it ends. This step is essential for character recognition. This paper presents Arabic character segmentation algorithm. The proposed a… ▽ More

    Submitted 3 July, 2017; originally announced July 2017.

  25. arXiv:1006.0831  [pdf

    cs.SD

    Treatment the Effects of Studio Wall Resonance and Coincidence Phenomena for Recording Noisy Speech Via FPGA Digital Filter

    Authors: Mahmoud I. A. Abdalla

    Abstract: This work introduces an economic solution for the problems of sound insulation of recording studios. Sound insulation at wall resonance frequency is weak. Instead of acoustical treatment, a digital filter is used to eliminate the effects of wall resonance and coincidence phenomena on recording of speech. Sound insulation of studio is measured to calculate the wall resonance frequency and the coinc… ▽ More

    Submitted 4 June, 2010; originally announced June 2010.

    Comments: Submitted to Journal of Telecommunications, see http://sites.google.com/site/journaloftelecommunications/volume-2-issue-2-may-2010

    Journal ref: Journal of Telecommunications,Volume 2, Issue 2, pp42-48, May 2010

  26. arXiv:1003.5627  [pdf

    cs.SD cs.LG

    Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

    Authors: Mahmoud I. Abdalla, Hanaa S. Ali

    Abstract: To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral Co… ▽ More

    Submitted 29 March, 2010; originally announced March 2010.

    Journal ref: Journal of Telecommunications, Volume 1, Issue 2, pp16-21, March 2010

  27. arXiv:0711.3299  [pdf

    cs.OH

    Influence of Micro-Cantilever Geometry and Gap on Pull-in Voltage

    Authors: W. Faris, H. Mohammed, M. M. Abdalla, C. -H. Ling

    Abstract: In this paper, we study the behaviour of a microcantilever beam under electrostatic actuation using finite difference method. This problem has a lot of applications in MEMS based devices like accelerometers, switches and others. In this paper, we formulated the problem of a cantilever beam with proof mass at its end and carried out the finite difference solution. we studied the effects of length… ▽ More

    Submitted 21 November, 2007; originally announced November 2007.

    Comments: Submitted on behalf of TIMA Editions (http://irevues.inist.fr/tima-editions)

    Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2006, Stresa, Lago Maggiore : Italie (2006)

  28. arXiv:cs/0610084  [pdf, ps, other

    cs.NI cs.CR

    Share and Disperse: How to Resist Against Aggregator Compromises in Sensor Networks

    Authors: Thomas Claveirole, Marcelo Dias de Amorim, Michel Abdalla, Yannis Viniotis

    Abstract: A common approach to overcome the limited nature of sensor networks is to aggregate data at intermediate nodes. A challenging issue in this context is to guarantee end-to-end security mainly because sensor networks are extremely vulnerable to node compromises. In order to secure data aggregation, in this paper we propose three schemes that rely on multipath routing. The first one guarantees data… ▽ More

    Submitted 13 October, 2006; originally announced October 2006.

    Comments: 9 pages, 3 figures, 2 tables