Skip to main content

Showing 1–50 of 82 results for author: Mohammad, S

.
  1. arXiv:2403.19637  [pdf, other

    q-bio.NC

    In the driver's mind: modeling the dynamics of human overtaking decisions in interactions with oncoming automated vehicles

    Authors: Samir H. A. Mohammad, Haneen Farah, Arkady Zgonnikov

    Abstract: Understanding human behavior in overtaking scenarios is crucial for enhancing road safety in mixed traffic with automated vehicles (AVs). Computational models of behavior play a pivotal role in advancing this understanding, as they can provide insight into human behavior generalizing beyond empirical studies. However, existing studies and models of human overtaking behavior have mostly focused on… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  2. arXiv:2403.18933  [pdf, other

    cs.CL

    SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

    Abstract: We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. The… ▽ More

    Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638

  3. arXiv:2403.13465  [pdf, other

    eess.AS eess.SP

    BanglaNum -- A Public Dataset for Bengali Digit Recognition from Speech

    Authors: Mir Sayeed Mohammad, Azizul Zahid, Md Asif Iqbal

    Abstract: Automatic speech recognition (ASR) converts the human voice into readily understandable and categorized text or words. Although Bengali is one of the most widely spoken languages in the world, there have been very few studies on Bengali ASR, particularly on Bangladeshi-accented Bengali. In this study, audio recordings of spoken digits (0-9) from university students were used to create a Bengali sp… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.02474  [pdf, other

    cs.CL

    The Emotion Dynamics of Literary Novels

    Authors: Krishnapriya Vishnubhotla, Adam Hammond, Graeme Hirst, Saif M. Mohammad

    Abstract: Stories are rich in the emotions they exhibit in their narratives and evoke in the readers. The emotional journeys of the various characters within a story are central to their appeal. Computational analysis of the emotions of novels, however, has rarely examined the variation in the emotional trajectories of the different characters within them, instead considering the entire novel to represent a… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 8 pages plus appendices

  5. arXiv:2403.02281  [pdf, other

    cs.CL

    Emotion Granularity from Text: An Aggregate-Level Indicator of Mental Health

    Authors: Krishnapriya Vishnubhotla, Daniela Teodorescu, Mallory J. Feldman, Kristen A. Lindquist, Saif M. Mohammad

    Abstract: We are united in how emotions are central to sha** our experiences; and yet, individuals differ greatly in how we each identify, categorize, and express emotions. In psychology, variation in the ability of individuals to differentiate between emotion concepts is called emotion granularity (determined through self-reports of one's emotions). High emotion granularity has been linked with better me… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 9 pages plus appendices

  6. arXiv:2402.12046  [pdf, other

    cs.DL cs.CL

    Citation Amnesia: NLP and Other Academic Fields Are in a Citation Age Recession

    Authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

    Abstract: This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, revea… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  7. arXiv:2402.08638  [pdf, other

    cs.CL

    SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata , et al. (2 additional authors not shown)

    Abstract: Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dat… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to the Findings of ACL 2024

  8. arXiv:2310.17369  [pdf, other

    cs.CL

    Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers

    Authors: Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif M. Mohammad

    Abstract: Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dyn… ▽ More

    Submitted 4 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 9 pages, 5 figures

  9. We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

    Authors: Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

    Abstract: Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Published at EMNLP 2023

    Journal ref: EMNLP 2023

  10. arXiv:2308.06929  [pdf

    cs.LG

    Predicting Listing Prices In Dynamic Short Term Rental Markets Using Machine Learning Models

    Authors: Sam Chapman, Seifey Mohammad, Kimberly Villegas

    Abstract: Our research group wanted to take on the difficult task of predicting prices in a dynamic market. And short term rentals such as Airbnb listings seemed to be the perfect proving ground to do such a thing. Airbnb has revolutionized the travel industry by providing a platform for homeowners to rent out their properties to travelers. The pricing of Airbnb rentals is prone to high fluctuations, with p… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 40 pages, 10 tables, 12 figures

  11. arXiv:2306.05387  [pdf, other

    cs.CL

    Utterance Emotion Dynamics in Children's Poems: Emotional Changes Across Age

    Authors: Daniela Teodorescu, Alona Fyshe, Saif M. Mohammad

    Abstract: Emerging psychopathology studies are showing that patterns of changes in emotional state -- emotion dynamics -- are associated with overall well-being and mental health. More recently, there has been some work in tracking emotion dynamics through one's utterances, allowing for data to be collected on a larger scale across time and people. However, several questions about how emotion dynamics chang… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 15 pages, 8 figures

  12. arXiv:2306.05203  [pdf, other

    q-bio.NC cs.RO

    A cognitive process approach to modeling gap acceptance in overtaking

    Authors: Samir H. A. Mohammad, Haneen Farah, Arkady Zgonnikov

    Abstract: Driving automation holds significant potential for enhancing traffic safety. However, effectively handling interactions with human drivers in mixed traffic remains a challenging task. Several models exist that attempt to capture human behavior in traffic interactions, often focusing on gap acceptance. However, it is not clear how models of an individual driver's gap acceptance can be translated to… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  13. arXiv:2306.02213  [pdf, other

    cs.CL

    Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis

    Authors: Daniela Teodorescu, Saif M. Mohammad

    Abstract: Emotion arcs capture how an individual (or a population) feels over time. They are widely used in industry and research; however, there is little work on evaluating the automatically generated arcs. This is because of the difficulty of establishing the true (gold) emotion arc. Our work, for the first time, systematically and quantitatively evaluates automatically generated emotion arcs. We also co… ▽ More

    Submitted 4 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: 9 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2210.07381

  14. arXiv:2305.18554  [pdf, other

    cs.CL cs.DL

    Forgotten Knowledge: Examining the Citational Amnesia in NLP

    Authors: Janvijay Singh, Mukund Rungta, Diyi Yang, Saif M. Mohammad

    Abstract: Citing papers is the primary method through which modern scientific writing discusses and builds on past work. Collectively, citing a diverse set of papers (in time and area of study) is an indicator of how widely the community is reading. Yet, there is little work looking at broad temporal patterns of citation. This work systematically and empirically examines: How far back in time do we tend to… ▽ More

    Submitted 31 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Main Conference

  15. arXiv:2305.12920  [pdf, other

    cs.CL

    A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why?

    Authors: Aniket Pramanick, Yufang Hou, Saif M. Mohammad, Iryna Gurevych

    Abstract: Understanding the fundamental concepts and trends in a scientific field is crucial for kee** abreast of its continuous advancement. In this study, we propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NL… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: accepted at EMNLP 2023

  16. The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

    Authors: Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort

    Abstract: Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Published at ACL 2023

    Journal ref: ACL 2023

  17. arXiv:2304.06845  [pdf, other

    cs.CL

    SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)

    Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif M. Mohammad, Meriem Beloucif, Sebastian Ruder

    Abstract: We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oro… ▽ More

    Submitted 1 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 19 pages, 5 figures, 6 tables

  18. arXiv:2303.18190  [pdf, other

    cs.CL

    Assessing Language Model Deployment with Risk Cards

    Authors: Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad

    Abstract: This paper introduces RiskCards, a framework for structured assessment and documentation of risks associated with an application of language models. As with all language, text generated by language models can be harmful, or used to bring about harm. Automating language generation adds both an element of scale and also more subtle or emergent undesirable tendencies to the generated text. Prior work… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  19. arXiv:2303.03886  [pdf, other

    cs.CY

    AI Usage Cards: Responsibly Reporting AI-generated Content

    Authors: Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Norman Meuschke, Bela Gipp

    Abstract: Given AI systems like ChatGPT can generate content that is indistinguishable from human-made work, the responsible use of this technology is a growing concern. Although understanding the benefits and harms of using AI systems requires more time, their rapid and indiscriminate adoption in practice is a reality. Currently, we lack a common framework and language to define and report the responsible… ▽ More

    Submitted 9 May, 2023; v1 submitted 16 February, 2023; originally announced March 2023.

  20. arXiv:2302.08956  [pdf, other

    cs.CL

    AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

    Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku , et al. (1 additional authors not shown)

    Abstract: Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti… ▽ More

    Submitted 4 November, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 14 pages, 3 Figures, 10 Tables

  21. arXiv:2212.13144  [pdf, ps, other

    stat.ME

    Statistical inference with normal-compound gamma priors in regression models

    Authors: Ahmed Alhamzawi, Gorgees Shaheed Mohammad

    Abstract: Scale-mixture shrinkage priors have recently been shown to possess robust empirical performance and excellent theoretical properties such as model selection consistency and (near) minimax posterior contraction rates. In this paper, the normal-compound gamma prior (NCG) resulting from compounding on the respective inverse-scale parameters with gamma distribution is used as a prior for the scale par… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

  22. arXiv:2211.03109  [pdf, other

    eess.IV cs.CV

    A Sequence Agnostic Multimodal Preprocessing for Clogged Blood Vessel Detection in Alzheimer's Diagnosis

    Authors: Partho Ghosh, Md. Abrar Istiak, Mir Sayeed Mohammad, Swapnil Saha, Uday Kamal

    Abstract: Successful identification of blood vessel blockage is a crucial step for Alzheimer's disease diagnosis. These blocks can be identified from the spatial and time-depth variable Two-Photon Excitation Microscopy (TPEF) images of the brain blood vessels using machine learning methods. In this study, we propose several preprocessing schemes to improve the performance of these methods. Our method includ… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures

  23. arXiv:2210.14424  [pdf, other

    cs.CL

    Geographic Citation Gaps in NLP Research

    Authors: Mukund Rungta, Janvijay Singh, Saif M. Mohammad, Diyi Yang

    Abstract: In a fair world, people have equitable opportunities to education, to conduct scientific research, to publish, and to get credit for their work, regardless of where they live. However, it is common knowledge among researchers that a vast number of papers accepted at top NLP venues come from a handful of western countries and (lately) China; whereas, very few papers from Africa and South America ge… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Main Conference

  24. arXiv:2210.07381  [pdf, other

    cs.CL

    Frustratingly Easy Sentiment Analysis of Text Streams: Generating High-Quality Emotion Arcs Using Emotion Lexicons

    Authors: Daniela Teodorescu, Saif M. Mohammad

    Abstract: Automatically generated emotion arcs -- that capture how an individual or a population feels over time -- are widely used in industry and research. However, there is little work on evaluating the generated arcs. This is in part due to the difficulty of establishing the true (gold) emotion arc. Our work, for the first time, systematically and quantitatively evaluates automatically generated emotion… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  25. arXiv:2210.07206  [pdf, other

    cs.CL

    Best Practices in the Creation and Use of Emotion Lexicons

    Authors: Saif M. Mohammad

    Abstract: Words play a central role in how we express ourselves. Lexicons of word-emotion associations are widely used in research and real-world applications for sentiment analysis, tracking emotions associated with products and policies, studying health disorders, tracking emotional arcs of stories, and so on. However, inappropriate and incorrect use of these lexicons can lead to not just sub-optimal resu… ▽ More

    Submitted 8 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Best Practices in the Creation and Use of Emotion Lexicons. Saif M. Mohammad. EACL, 2023, Dubrovnik, Croatia

  26. arXiv:2210.06878  [pdf, other

    cs.CL cs.DL

    CS-Insights: A System for Analyzing Computer Science Research

    Authors: Terry Ruas, Jan Philip Wahle, Lennart Küll, Saif M. Mohammad, Bela Gipp

    Abstract: This paper presents CS-Insights, an interactive web application to analyze computer science publications from DBLP through multiple perspectives. The dedicated interfaces allow its users to identify trends in research activity, productivity, accessibility, author's productivity, venues' statistics, topics of interest, and the impact of computer science research on other fields. CS-Insightsis publi… ▽ More

    Submitted 29 January, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

  27. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  28. arXiv:2204.13384  [pdf, other

    cs.DL cs.CL

    D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research

    Authors: Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp

    Abstract: DBLP is the largest open-access repository of scientific articles on computer science and provides metadata associated with publications, authors, and venues. We retrieved more than 6 million publications from DBLP and extracted pertinent metadata (e.g., abstracts, author affiliations, citations) from the publication texts to create the DBLP Discovery Dataset (D3). D3 can be used to identify trend… ▽ More

    Submitted 10 November, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Journal ref: LREC 2022

  29. arXiv:2204.04862  [pdf, other

    cs.CL

    Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada

    Authors: Krishnapriya Vishnubhotla, Saif M. Mohammad

    Abstract: Over the last decade, Twitter has emerged as one of the most influential forums for social, political, and health discourse. In this paper, we introduce a massive dataset of more than 45 million geo-located tweets posted between 2015 and 2021 from US and Canada (TUSC), especially curated for natural language analysis. We also introduce Tweet Emotion Dynamics (TED) -- metrics to capture patterns of… ▽ More

    Submitted 4 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted for publication at LREC 2022 (camera-ready)

  30. arXiv:2110.04845  [pdf, other

    cs.CL

    What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study

    Authors: Mohamed Abdalla, Krishnapriya Vishnubhotla, Saif M. Mohammad

    Abstract: The degree of semantic relatedness of two units of language has long been considered fundamental to understanding meaning. Additionally, automatically determining relatedness has many applications such as question answering and summarization. However, prior NLP work has largely focused on semantic similarity, a subset of relatedness, because of a lack of relatedness datasets. In this paper, we int… ▽ More

    Submitted 20 March, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Accepted to EACL 2023; Our dataset, data statement, and annotation questionnaire can be found at: https://doi.org/10.5281/zenodo.7599667

  31. arXiv:2109.08256  [pdf, other

    cs.CL cs.AI

    Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis

    Authors: Saif M. Mohammad

    Abstract: The importance and pervasiveness of emotions in our lives makes affective computing a tremendously important and vibrant line of work. Systems for automatic emotion recognition (AER) and sentiment analysis can be facilitators of enormous progress (e.g., in improving public health and commerce) but also enablers of great harm (e.g., for suppressing dissidents and manipulating voters). Thus, it is i… ▽ More

    Submitted 20 March, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: To Appear in Computational Linguistics, June 2022

  32. arXiv:2107.01183  [pdf, other

    cs.AI cs.CL

    Ethics Sheets for AI Tasks

    Authors: Saif M. Mohammad

    Abstract: Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this positio… ▽ More

    Submitted 19 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: In Proceedings of the 60th Annual Meeting of the Association of Computational Linguistics (ACL-2022), May 2022, Dublin, Ireland

  33. arXiv:2106.05664  [pdf, other

    cs.CL cs.AI

    Ruddit: Norms of Offensiveness for English Reddit Comments

    Authors: Rishav Hada, Sohi Sudhir, Pushkar Mishra, Helen Yannakoudakis, Saif M. Mohammad, Ekaterina Shutova

    Abstract: On social media platforms, hateful and offensive language negatively impact the mental well-being of users and the participation of people from diverse backgrounds. Automatic methods to detect offensive language have largely relied on datasets with categorical labels. However, comments can vary in their degree of offensiveness. We create the first dataset of English language Reddit comments that h… ▽ More

    Submitted 25 January, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Camera-ready version in ACL 2021

  34. Emotion Dynamics in Movie Dialogues

    Authors: Will E. Hipson, Saif M. Mohammad

    Abstract: Emotion dynamics is a framework for measuring how an individual's emotions change over time. It is a powerful tool for understanding how we behave and interact with the world. In this paper, we introduce a framework to track emotion dynamics through one's utterances. Specifically we introduce a number of utterance emotion dynamics (UED) metrics inspired by work in Psychology. We use this approach… ▽ More

    Submitted 6 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: 19 pages, 7 figures

  35. arXiv:2102.11356  [pdf

    cs.CV

    Shadow Image Enlargement Distortion Removal

    Authors: Raid R. Al-Nima, Ali N. Hamoodi, Radhwan Y. Al-Jawadi, Ziad S. Mohammad

    Abstract: This project aims to adopt preprocessing operations to get less distortions for shadow image enlargement. The preprocessing operations consists of three main steps: first enlarge the original shadow image by using any kind of interpolation methods, second apply average filter to the enlargement image and finally apply the unsharp filter to the previous averaged image. These preprocessing operation… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: 7 pages, 6 figures and 3 Tables

  36. arXiv:2011.03492  [pdf, ps, other

    cs.CL

    Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons

    Authors: Saif M. Mohammad

    Abstract: Lexicons of word-emotion associations are widely used in research and real-world applications. As part of my research, I have created several such lexicons (e.g., the NRC Emotion Lexicon). This paper outlines some practical and ethical considerations involved in the effective use of these lexical resources.

    Submitted 9 December, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

  37. arXiv:2006.03096  [pdf, other

    cs.CL cs.CY

    SOLO: A Corpus of Tweets for Examining the State of Being Alone

    Authors: Svetlana Kiritchenko, Will E. Hipson, Robert J. Coplan, Saif M. Mohammad

    Abstract: The state of being alone can have a substantial impact on our lives, though experiences with time alone diverge significantly among individuals. Psychologists distinguish between the concept of solitude, a positive state of voluntary aloneness, and the concept of loneliness, a negative state of dissatisfaction with the quality of one's social interactions. Here, for the first time, we conduct a la… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: In Proceedings of the 12th edition of the Language Resources and Evaluation Conference (LREC), May 2020

  38. arXiv:2006.01131  [pdf, other

    cs.DL cs.CL

    NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature

    Authors: Saif M. Mohammad

    Abstract: As part of the NLP Scholar project, we created a single unified dataset of NLP papers and their meta-information (including citation numbers), by extracting and aligning information from the ACL Anthology and Google Scholar. In this paper, we describe several interconnected interactive visualizations (dashboards) that present various aspects of the data. Clicking on an item within a visualization… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: arXiv admin note: text overlap with arXiv:2005.00912

    Journal ref: Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020

  39. arXiv:2005.11882  [pdf, other

    cs.CL

    Sentiment Analysis: Automatically Detecting Valence, Emotions, and Other Affectual States from Text

    Authors: Saif M. Mohammad

    Abstract: Recent advances in machine learning have led to computer systems that are human-like in behaviour. Sentiment analysis, the automatic determination of emotions in text, is allowing us to capitalize on substantial previously unattainable opportunities in commerce, public health, government policy, social sciences, and art. Further, analysis of emotions in text, from news to social media posts, is im… ▽ More

    Submitted 13 January, 2021; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: This is the author's manuscript of what is slated to appear in the Second Edition of Emotion Measurement, 2021

    Journal ref: Second Edition of Emotion Measurement, 2021

  40. arXiv:2005.00962  [pdf, other

    cs.DL cs.CL

    Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations

    Authors: Saif M. Mohammad

    Abstract: Disparities in authorship and citations across gender can have substantial adverse consequences not just on the disadvantaged genders, but also on the field of study as a whole. Measuring gender gaps is a crucial step towards addressing them. In this work, we examine female first author percentages and the citations to their papers in Natural Language Processing (1965 to 2019). We determine aggreg… ▽ More

    Submitted 3 September, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL 2020). July 2020. Seattle, USA

  41. arXiv:2005.00912  [pdf, other

    cs.DL cs.CL

    Examining Citations of Natural Language Processing Literature

    Authors: Saif M. Mohammad

    Abstract: We extracted information from the ACL Anthology (AA) and Google Scholar (GS) to examine trends in citations of NLP papers. We explore questions such as: how well cited are papers of different types (journal articles, conference papers, demo papers, etc.)? how well cited are papers from different areas of within NLP? etc. Notably, we show that only about 56\% of the papers in AA are cited ten or mo… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL 2020). July 2020. Seattle, USA

  42. arXiv:2004.13559  [pdf

    eess.SP eess.SY

    Kalman Filter and Wavelet Cross-correlation for VHF Broadband Interferometer Lightning Map**

    Authors: Ammar Alammari, Ammar Alkahtani, Mohd Riduan, Fuad Noman, Mona Riza Mohd Esa, Muhammad Haziq Mohamad Sabri, Sulaiman Ali Mohammad, Ahmed Salih Al-Khaleefa, Zen Kawasaki, Vassilios Agelidis

    Abstract: A lightning map** system based on perpendicular crossed baseline interferometer (ITF) technology has been developed rapidly in recent years. Several processing methods have been proposed to estimate the temporal location and spatial map of lightning strikes. In this paper, a single very high frequency (VHF) interferometer is used to simulate and augment the lightning maps. We perform a comparati… ▽ More

    Submitted 25 June, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: 13 pages, 10 figures

  43. arXiv:2004.06188  [pdf, other

    cs.CL

    PoKi: A Large Dataset of Poems by Children

    Authors: Will E. Hipson, Saif M. Mohammad

    Abstract: Child language studies are crucial in improving our understanding of child well-being; especially in determining the factors that impact happiness, the sources of anxiety, techniques of emotion regulation, and the mechanisms to cope with stress. However, much of this research is stymied by the lack of availability of large child-written texts. We present a new corpus of child-written text, PoKi, w… ▽ More

    Submitted 2 May, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France

  44. arXiv:1912.02387  [pdf, other

    cs.CL cs.IR cs.LG

    SemEval-2015 Task 10: Sentiment Analysis in Twitter

    Authors: Sara Rosenthal, Saif M Mohammad, Preslav Nakov, Alan Ritter, Svetlana Kiritchenko, Veselin Stoyanov

    Abstract: In this paper, we describe the 2015 iteration of the SemEval shared task on Sentiment Analysis in Twitter. This was the most popular sentiment analysis shared task to date with more than 40 teams participating in each of the last three years. This year's shared task competition consisted of five sentiment prediction subtasks. Two were reruns from previous years: (A) sentiment expressed by a phrase… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Comments: Sentiment analysis, sentiment towards a topic, quantification, microblog sentiment analysis; Twitter opinion mining

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: SemEval-2015

  45. arXiv:1911.03562  [pdf, other

    cs.DL cs.CL

    The State of NLP Literature: A Diachronic Analysis of the ACL Anthology

    Authors: Saif M. Mohammad

    Abstract: The ACL Anthology (AA) is a digital repository of tens of thousands of articles on Natural Language Processing (NLP). This paper examines the literature as a whole to identify broad trends in productivity, focus, and impact. It presents the analyses in a sequence of questions and answers. The goal is to record the state of the AA literature: who and how many of us are publishing? what are we publi… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

  46. The Natural Selection of Words: Finding the Features of Fitness

    Authors: Peter D. Turney, Saif M. Mohammad

    Abstract: We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    ACM Class: I.2.6; I.2.7

    Journal ref: Published in PLOS ONE, 14(1), e0211512, January 28, 2019

  47. arXiv:1809.01083  [pdf, ps, other

    cs.CL

    IEST: WASSA-2018 Implicit Emotions Shared Task

    Authors: Roman Klinger, Orphée De Clercq, Saif M. Mohammad, Alexandra Balahur

    Abstract: Past shared tasks on emotions use data with both overt expressions of emotions (I am so happy to see you!) as well as subtle expressions where the emotions have to be inferred, for instance from event descriptions. Further, most datasets do not focus on the cause or the stimulus of the emotion. Here, for the first time, we propose a shared task where systems have to predict the emotions in a large… ▽ More

    Submitted 5 September, 2018; v1 submitted 4 September, 2018; originally announced September 2018.

    Comments: Accepted at Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

  48. arXiv:1805.04558  [pdf, ps, other

    cs.CL

    NRC-Canada at SMM4H Shared Task: Classifying Tweets Mentioning Adverse Drug Reactions and Medication Intake

    Authors: Svetlana Kiritchenko, Saif M. Mohammad, Jason Morin, Berry de Bruijn

    Abstract: Our team, NRC-Canada, participated in two shared tasks at the AMIA-2017 Workshop on Social Media Mining for Health Applications (SMM4H): Task 1 - classification of tweets mentioning adverse drug reactions, and Task 2 - classification of tweets describing personal medication intake. For both tasks, we trained Support Vector Machine classifiers using a variety of surface-form, sentiment, and domain-… ▽ More

    Submitted 11 May, 2018; originally announced May 2018.

    Comments: In Proceedings of the Social Media Mining for Health Applications Workshop at AMIA-2017, Washington, DC, USA, 2017

  49. arXiv:1805.04542  [pdf, ps, other

    cs.CL

    Sentiment Composition of Words with Opposing Polarities

    Authors: Svetlana Kiritchenko, Saif M. Mohammad

    Abstract: In this paper, we explore sentiment composition in phrases that have at least one positive and at least one negative word---phrases like 'happy accident' and 'best winter break'. We compiled a dataset of such opposing polarity phrases and manually annotated them with real-valued scores of sentiment association. Using this dataset, we analyze the linguistic patterns present in opposing polarity phr… ▽ More

    Submitted 11 May, 2018; originally announced May 2018.

    Comments: In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), San Diego, California, 2016

  50. arXiv:1805.04508  [pdf, other

    cs.CL

    Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

    Authors: Svetlana Kiritchenko, Saif M. Mohammad

    Abstract: Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences c… ▽ More

    Submitted 11 May, 2018; originally announced May 2018.

    Comments: In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM), New Orleans, USA, 2018