Search | arXiv e-print repository

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism

Authors: Mehran Hosseini, Peyman Hosseini

Abstract: Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three impro… ▽ More Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three improvements to mitigate these inefficiencies, thereby, introducing three enhanced attention mechanisms: Optimised, Efficient, and Super Attention. Optimised and Efficient Attention have one and two matrix multiplications fewer per head, respectively, and 25% and 50% fewer parameters, respectively, than standard SDPA, but perform similarly to standard SDPA in both vision and natural language tasks. They can be used in all applications where SDPA is used while offering smaller model sizes and faster training and inference without noticeable loss in performance. Super Attention introduces a new linear transformation on the values, transforming them from the left. It outperforms standard SPDA on vision and natural language tasks by up to 17% while having one fewer matrix multiplication per head and 25% fewer parameters than standard SDPA. Consequently, it is also faster than standard SDPA. Super Attention is ideal in applications where the attention layer's context length is fixed, such as Vision Transformers. In addition to providing mathematical reasoning, we evaluate the presented attention mechanisms on several datasets including MNIST, CIFAR100, ImageNet, IMDB Movie Reviews, and Amazon Reviews datasets, as well as combined Europarl and Anki English-Spanish datasets for neural machine translation. △ Less

Submitted 30 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

MSC Class: 68T07 (Primary) 68T45; 68T50; 68T10; 15A03; 15A04 (Secondary) ACM Class: I.2.6; I.2.7; I.2.10; I.4.0; I.5.0; I.7.0

arXiv:2402.18919 [pdf, other]

Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation

Authors: Fahimeh Hosseini Noohdani, Parsa Hosseini, Aryan Yazdan Parast, Hamidreza Yaghoubi Araghi, Mahdieh Soleymani Baghshah

Abstract: While standard Empirical Risk Minimization (ERM) training is proven effective for image classification on in-distribution data, it fails to perform well on out-of-distribution samples. One of the main sources of distribution shift for image classification is the compositional nature of images. Specifically, in addition to the main object or component(s) determining the label, some other image comp… ▽ More While standard Empirical Risk Minimization (ERM) training is proven effective for image classification on in-distribution data, it fails to perform well on out-of-distribution samples. One of the main sources of distribution shift for image classification is the compositional nature of images. Specifically, in addition to the main object or component(s) determining the label, some other image components usually exist, which may lead to the shift of input distribution between train and test environments. More importantly, these components may have spurious correlations with the label. To address this issue, we propose Decompose-and-Compose (DaC), which improves robustness to correlation shift by a compositional approach based on combining elements of images. Based on our observations, models trained with ERM usually highly attend to either the causal components or the components having a high spurious correlation with the label (especially in datapoints on which models have a high confidence). In fact, according to the amount of spurious correlation and the easiness of classification based on the causal or non-causal components, the model usually attends to one of these more (on samples with high confidence). Following this, we first try to identify the causal components of images using class activation maps of models trained with ERM. Afterward, we intervene on images by combining them and retraining the model on the augmented data, including the counterfactual ones. Along with its high interpretability, this work proposes a group-balancing method by intervening on images without requiring group labels or information regarding the spurious features during training. The method has an overall better worst group accuracy compared to previous methods with the same amount of supervision on the group labels in correlation shift. △ Less

Submitted 2 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2401.01951 [pdf, other]

Can We Generate Realistic Hands Only Using Convolution?

Authors: Mehran Hosseini, Peyman Hosseini

Abstract: The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adver… ▽ More The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adversarial Networks (GAN), pointing to a fundamental shortcoming in the underlying architectures. In this paper, we demonstrate how this problem can be mitigated by augmenting convolution layers geometric capabilities through providing them with a single input channel incorporating the relative $n$-dimensional Cartesian coordinate system. We show that this drastically improves quality of hand and face images generated by GANs and Variational AutoEncoders (VAE). △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Contains 17 pages, 14 figures, and 6 tables

MSC Class: 51 ACM Class: I.2.10; I.4.0; I.4.10

arXiv:2303.02468 [pdf, other]

doi 10.18653/v1/2023.semeval-1.185

Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

Authors: Peyman Hosseini, Mehran Hosseini, Sana Sabah Al-Azzawi, Marcus Liwicki, Ignacio Castro, Matthew Purver

Abstract: We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output… ▽ More We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while kee** other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper. △ Less

Submitted 3 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: Accepted in ACL 2023 SemEval Workshop as selected task paper

ACM Class: I.2.7

arXiv:2205.12484 [pdf, other]

GisPy: A Tool for Measuring Gist Inference Score in Text

Authors: Pedram Hosseini, Christopher R. Wolfe, Mona Diab, David A. Broniatowski

Abstract: Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of develo** GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domai… ▽ More Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that individuals tend to rely on gist, or bottom-line meaning, in the text when making decisions. In this work, we delineate the process of develo** GisPy, an open-source tool in Python for measuring the Gist Inference Score (GIS) in text. Evaluation of GisPy on documents in three benchmarks from the news and scientific text domains demonstrates that scores generated by our tool significantly distinguish low vs. high gist documents. Our tool is publicly available to use at: https://github.com/phosseini/GisPy. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted to the 4th Workshop on Narrative Understanding @ NAACL 2022

arXiv:2112.08615 [pdf, other]

Knowledge-Augmented Language Models for Cause-Effect Relation Classification

Authors: Pedram Hosseini, David A. Broniatowski, Mona Diab

Abstract: Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models. However, these methods behave differently across domains and downstream tasks. In this work, we investigate the augmentation of pretrained language models with commonsense knowledge in the cause-effect relation classification and commonsense causal reasoning tasks. After automatically verbaliz… ▽ More Previous studies have shown the efficacy of knowledge augmentation methods in pretrained language models. However, these methods behave differently across domains and downstream tasks. In this work, we investigate the augmentation of pretrained language models with commonsense knowledge in the cause-effect relation classification and commonsense causal reasoning tasks. After automatically verbalizing ATOMIC2020, a wide coverage commonsense reasoning knowledge graph, and GLUCOSE, a dataset of implicit commonsense causal knowledge, we continually pretrain BERT and RoBERTa with the verbalized data. Then we evaluate the resulting models on cause-effect pair classification and answering commonsense causal reasoning questions. Our results show that continually pretrained language models augmented with commonsense knowledge outperform our baselines on two commonsense causal reasoning benchmarks, COPA and BCOPA-CE, and the Temporal and Causal Reasoning (TCR) dataset, without additional improvement in model architecture or using quality-enhanced data for fine-tuning. △ Less

Submitted 1 June, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: Accepted to Commonsense Representation and Reasoning (CSRR) @ ACL 2022

arXiv:2103.13606 [pdf, other]

Predicting Directionality in Causal Relations in Text

Authors: Pedram Hosseini, David A. Broniatowski, Mona Diab

Abstract: In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content. Our preliminary results show that predicting direction for inter-sentence and implicit causal relations is more challenging. And, SpanBERT performs better than BERT on causal samples with longer span length. We also in… ▽ More In this work, we test the performance of two bidirectional transformer-based language models, BERT and SpanBERT, on predicting directionality in causal pairs in the textual content. Our preliminary results show that predicting direction for inter-sentence and implicit causal relations is more challenging. And, SpanBERT performs better than BERT on causal samples with longer span length. We also introduce CREST which is a framework for unifying a collection of scattered datasets of causal relations. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2012.06154 [pdf, other]

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Authors: Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat… ▽ More Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding. △ Less

Submitted 13 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: To appear on Transactions of the Association for Computational Linguistics (TACL), 2021

arXiv:2010.06671 [pdf, other]

A Multi-Modal Method for Satire Detection using Textual and Visual Cues

Authors: Lily Li, Or Levi, Pedram Hosseini, David A. Broniatowski

Abstract: Satire is a form of humorous critique, but it is sometimes misinterpreted by readers as legitimate news, which can lead to harmful consequences. We observe that the images used in satirical news articles often contain absurd or ridiculous content and that image manipulation is used to create fictional scenarios. While previous work have studied text-based methods, in this work we propose a multi-m… ▽ More Satire is a form of humorous critique, but it is sometimes misinterpreted by readers as legitimate news, which can lead to harmful consequences. We observe that the images used in satirical news articles often contain absurd or ridiculous content and that image manipulation is used to create fictional scenarios. While previous work have studied text-based methods, in this work we propose a multi-modal approach based on state-of-the-art visiolinguistic model ViLBERT. To this end, we create a new dataset consisting of images and headlines of regular and satirical news for the task of satire detection. We fine-tune ViLBERT on the dataset and train a convolutional neural network that uses an image forensics technique. Evaluation on the dataset shows that our proposed multi-modal approach outperforms image-only, text-only, and simple fusion baselines. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: Accepted to the Third Workshop on NLP for Internet Freedom (NLP4IF): Censorship, Disinformation, and Propaganda. Co-located with COLING 2020

arXiv:2005.08400 [pdf, other]

Content analysis of Persian/Farsi Tweets during COVID-19 pandemic in Iran using NLP

Authors: Pedram Hosseini, Poorya Hosseini, David A. Broniatowski

Abstract: Iran, along with China, South Korea, and Italy was among the countries that were hit hard in the first wave of the COVID-19 spread. Twitter is one of the widely-used online platforms by Iranians inside and abroad for sharing their opinion, thoughts, and feelings about a wide range of issues. In this study, using more than 530,000 original tweets in Persian/Farsi on COVID-19, we analyzed the topics… ▽ More Iran, along with China, South Korea, and Italy was among the countries that were hit hard in the first wave of the COVID-19 spread. Twitter is one of the widely-used online platforms by Iranians inside and abroad for sharing their opinion, thoughts, and feelings about a wide range of issues. In this study, using more than 530,000 original tweets in Persian/Farsi on COVID-19, we analyzed the topics discussed among users, who are mainly Iranians, to gauge and track the response to the pandemic and how it evolved over time. We applied a combination of manual annotation of a random sample of tweets and topic modeling tools to classify the contents and frequency of each category of topics. We identified the top 25 topics among which living experience under home quarantine emerged as a major talking point. We additionally categorized broader content of tweets that shows satire, followed by news, is the dominant tweet type among the Iranian users. While this framework and methodology can be used to track public response to ongoing developments related to COVID-19, a generalization of this framework can become a useful framework to gauge Iranian public reaction to ongoing policy measures or events locally and internationally. △ Less

Submitted 17 May, 2020; originally announced May 2020.

arXiv:2004.09745 [pdf, other]

Automatically Identifying Political Ads on Facebook: Towards Understanding of Manipulation via User Targeting

Authors: Or Levi, Sardar Hamidian, Pedram Hosseini

Abstract: The reports of Russian interference in the 2016 United States elections brought into the center of public attention concerns related to the ability of foreign actors to increase social discord and take advantage of personal user data for political purposes. It has raised questions regarding the ways and the extent to which data can be used to create psychographical profiles to determine what kind… ▽ More The reports of Russian interference in the 2016 United States elections brought into the center of public attention concerns related to the ability of foreign actors to increase social discord and take advantage of personal user data for political purposes. It has raised questions regarding the ways and the extent to which data can be used to create psychographical profiles to determine what kind of advertisement would be most effective to persuade a particular person in a particular location for some political event. In this work, we study the political ads dataset collected by ProPublica, an American nonprofit newsroom, using a network of volunteers in the period before the 2018 US midterm elections. We first describe the main characteristics of the data and explore the user attributes including age, region, activity, and more, with a series of interactive illustrations. Furthermore, an important first step towards understating of political manipulation via user targeting is to identify politically related ads, yet manually checking ads is not feasible due to the scale of social media advertising. Consequently, we address the challenge of automatically classifying between political and non-political ads, demonstrating a significant improvement compared to the current text-based classifier used by ProPublica, and study whether the user targeting attributes are beneficial for this task. Our evaluation sheds light on questions, such as how user attributes are being used for political ads targeting and which users are more prone to be targeted with political ads. Overall, our contribution of data exploration, political ad classification and initial analysis of the targeting attributes, is designed to support future work with the ProPublica dataset, and specifically with regard to the understanding of political manipulation via user targeting. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: Accepted to the 2nd Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM 2020)

arXiv:1911.05263 [pdf, other]

doi 10.29007/f4j4

LexiPers: An ontology based sentiment lexicon for Persian

Authors: Behnam Sabeti, Pedram Hosseini, Gholamreza Ghassem-Sani, Seyed Abolghasem Mirroshandel

Abstract: Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for P… ▽ More Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon. △ Less

Submitted 12 November, 2019; originally announced November 2019.

arXiv:1910.01160 [pdf, ps, other]

doi 10.18653/v1/D19-5004

Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues

Authors: Or Levi, Pedram Hosseini, Mona Diab, David A. Broniatowski

Abstract: The blurry line between nefarious fake news and protected-speech satire has been a notorious struggle for social media platforms. Further to the efforts of reducing exposure to misinformation on social media, purveyors of fake news have begun to masquerade as satire sites to avoid being demoted. In this work, we address the challenge of automatically classifying fake news versus satire. Previous w… ▽ More The blurry line between nefarious fake news and protected-speech satire has been a notorious struggle for social media platforms. Further to the efforts of reducing exposure to misinformation on social media, purveyors of fake news have begun to masquerade as satire sites to avoid being demoted. In this work, we address the challenge of automatically classifying fake news versus satire. Previous work have studied whether fake news and satire can be distinguished based on language differences. Contrary to fake news, satire stories are usually humorous and carry some political or social message. We hypothesize that these nuances could be identified using semantic and linguistic cues. Consequently, we train a machine learning method using semantic representation, with a state-of-the-art contextual language model, and with linguistic features based on textual coherence metrics. Empirical evaluation attests to the merits of our approach compared to the language-based baseline and sheds light on the nuances between fake news and satire. As avenues for future work, we consider studying additional linguistic features related to the humor aspect, and enriching the data with current news events, to help identify a political or social message. △ Less

Submitted 5 November, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted to the 2nd Workshop on NLP for Internet Freedom (NLP4IF): Censorship, Disinformation, and Propaganda. Co-located with EMNLP-IJCNLP 2019

arXiv:1801.07737 [pdf, other]

SentiPers: A Sentiment Analysis Corpus for Persian

Authors: Pedram Hosseini, Ali Ahmadian Ramaki, Hassan Maleki, Mansoureh Anvari, Seyed Abolghasem Mirroshandel

Abstract: Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play a… ▽ More Sentiment Analysis (SA) is a major field of study in natural language processing, computational linguistics and information retrieval. Interest in SA has been constantly growing in both academia and industry over the recent years. Moreover, there is an increasing need for generating appropriate resources and datasets in particular for low resource languages including Persian. These datasets play an important role in designing and develo** appropriate opinion mining platforms using supervised, semi-supervised or unsupervised methods. In this paper, we outline the entire process of develo** a manually annotated sentiment corpus, SentiPers, which covers formal and informal written contemporary Persian. To the best of our knowledge, SentiPers is a unique sentiment corpus with such a rich annotation in three different levels including document-level, sentence-level, and entity/aspect-level for Persian. The corpus contains more than 26000 sentences of users opinions from digital product domain and benefits from special characteristics such as quantifying the positiveness or negativity of an opinion through assigning a number within a specific range to any given sentence. Furthermore, we present statistics on various components of our corpus as well as studying the inter-annotator agreement among the annotators. Finally, some of the challenges that we faced during the annotation process will be discussed as well. △ Less

Submitted 1 January, 2021; v1 submitted 23 January, 2018; originally announced January 2018.

Comments: This work is accepted to the 3rd Conference on Computational Linguistics, Sharif University of Technology

Showing 1–14 of 14 results for author: Hosseini, P