Search | arXiv e-print repository

Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement

Authors: Samuel Mensah, Kai Sun, Nikolaos Aletras

Abstract: State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective a… ▽ More State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective at representing rare words or words with insufficient context information. To address this issue, this work trades syntax trees for BERT wordpieces by entirely removing the GCN component from the methods' architectures. To enhance TOWE performance, we tackle the issue of aspect representation loss during encoding. Instead of solely utilizing a sentence as the input, we use a sentence-aspect pair. Our relatively simple approach achieves state-of-the-art results on benchmark datasets and should serve as a strong baseline for further research. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023

arXiv:2302.14719 [pdf, other]

Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction

Authors: Kai Sun, Richong Zhang, Samuel Mensah, Nikolaos Aletras, Yongyi Mao, Xudong Liu

Abstract: Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduc… ▽ More Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduce the domain gap between the labelled source and unlabelled target domains to improve target domain performance. However, this approach only aligns feature distributions and does not account for class-wise feature alignment, leading to suboptimal results. Semi-supervised learning (SSL) has been explored as a solution, but is limited by the quality of pseudo-labels generated by the model. Inspired by the theoretical foundations in domain adaptation [2], we propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagree on the unlabelled target data, in an effort to boost the target domain performance. Extensive experiments on benchmark cross-domain OTE datasets show that this approach is effective and performs consistently well in settings with large domain shifts. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted at TheWebConf 2023

arXiv:2204.10293 [pdf, other]

A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

Authors: Mingchen Li, Junfan Chen, Samuel Mensah, Nikolaos Aletras, Xiulong Yang, Yang Ye

Abstract: Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embeddi… ▽ More Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embedding for each word token in the text. These methods lack robustness as they suffer from the out-of-vocabulary (OOV) problem. Meanwhile, models built on character n-grams have the capability of generating expressive representations for OOV words. Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP. Our approach works by first constructing a hierarchical n-gram graph on the surface name to model the organizational structure of n-grams that leads to the surface name. A GramTransformer, based on the Transformer is then presented to model the hierarchical n-gram graph to construct the relation embedding for ZSLP. Experimental results show the proposed HNZSLP achieved state-of-the-art performance on two ZSLP datasets. △ Less

Submitted 6 January, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

Comments: Published as a conference paper at EMNLP Findings 2022

arXiv:2109.01238 [pdf, other]

An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

Authors: Samuel Mensah, Kai Sun, Nikolaos Aletras

Abstract: Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this informati… ▽ More Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: Accepted at EMNLP 2021

ACM Class: I.2.7

arXiv:2105.14179 [pdf]

doi 10.1109/QRS.2017.44

Investigating the Significance of Bellwether Effect to Improve Software Effort Estimation

Authors: Solomon Mensah, Jacky Keung, Stephen G. MacDonell, Michael F. Bosu, Kwabena E. Bennin

Abstract: Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation acc… ▽ More Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation accuracy using moving windows. The existence of the Bellwether was empirically proven based on six postulations. We apply statistical stratification and Markov chain methodology to select the Bellwether moving window. The resulting Bellwether moving window is used to predict the software effort of a new project. Empirical results show that Bellwether effect exist in chronological datasets with a set of exemplary and recently completed projects representing the Bellwether moving window. Result from this study has shown that the use of Bellwether moving window with the Gaussian weighting function significantly improve the prediction accuracy. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: Conference paper, 13 papers, 3 figures, 9 tables. arXiv admin note: text overlap with arXiv:2105.07366

Journal ref: Proceedings of the 2017 International Conference on Software Quality, Reliability and Security (QRS2017). Prague, Czech Republic, IEEE Computer Society Press, pp.340-351

arXiv:2105.07366 [pdf]

doi 10.1109/TR.2018.2839718

Investigating the Significance of the Bellwether Effect to Improve Software Effort Prediction: Further Empirical Study

Authors: Solomon Mensah, Jacky Keung, Stephen G. MacDonell, Michael Franklin Bosu, Kwabena Ebo Bennin

Abstract: Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the BMW yields improved accuracy in general, since dif… ▽ More Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the BMW yields improved accuracy in general, since different sizing and aging parameters of the BMW are known to affect accuracy. Objective: To investigate the existence of exemplary projects (Bellwethers) with defined window size and age parameters, and whether their use in SEP improves prediction accuracy. Method: We empirically investigate the moving window assumption based on the theory that the prediction outcome of a future event depends on the outcomes of prior events. Sampling of Bellwethers was undertaken using three introduced Bellwether methods (SSPM, SysSam, and RandSam). The ergodic Markov chain was used to determine the stationarity of the Bell-wethers. Results: Empirical results show that 1) Bellwethers exist in SEP and 2) the BMW has an approximate size of 50 to 80 exemplary projects that should not be more than 2 years old relative to the new projects to be estimated. Conclusion: The study's results add further weight to the recommended use of Bellwethers for improved prediction accuracy in SEP. △ Less

Submitted 16 May, 2021; originally announced May 2021.

Comments: Journal paper, 23 pages, 8 figures, 9 tables

Journal ref: IEEE Transactions on Reliability 67(3)(2018), pp.1176-1198

arXiv:2103.15963 [pdf, ps, other]

Contextual Text Embeddings for Twi

Authors: Paul Azunre, Salomey Osei, Salomey Addo, Lawrence Asamoah Adu-Gyamfi, Stephen Moore, Bernard Adabankah, Bernard Opoku, Clara Asare-Nyarko, Samuel Nyarko, Cynthia Amoaba, Esther Dansoa Appiah, Felix Akwerh, Richard Nii Lante Lawson, Joel Budu, Emmanuel Debrah, Nana Boateng, Wisdom Ofori, Edwin Buabeng-Munkoh, Franklin Adjei, Isaac Kojo Essel Ampomah, Joseph Otoo, Reindorf Borkor, Standylove Birago Mensah, Lucien Mensah, Mark Amoako Marcel , et al. (2 additional authors not shown)

Abstract: Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this researc… ▽ More Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this research work is the development of several pretrained transformer language models for the Akuapem and Asante dialects of Twi, paving the way for advances in application areas such as Named Entity Recognition (NER), Neural Machine Translation (NMT), Sentiment Analysis (SA) and Part-of-Speech (POS) tagging. Specifically, we introduce four different flavours of ABENA -- A BERT model Now in Akan that is fine-tuned on a set of Akan corpora, and BAKO - BERT with Akan Knowledge only, which is trained from scratch. We open-source the model through the Hugging Face model hub and demonstrate its use via a simple sentiment classification example. △ Less

Submitted 31 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: 10 pages paper; Accepted at African NLP Workshop @ EACL 2021

arXiv:2103.15625 [pdf, other]

English-Twi Parallel Corpus for Machine Translation

Authors: Paul Azunre, Salomey Osei, Salomey Addo, Lawrence Asamoah Adu-Gyamfi, Stephen Moore, Bernard Adabankah, Bernard Opoku, Clara Asare-Nyarko, Samuel Nyarko, Cynthia Amoaba, Esther Dansoa Appiah, Felix Akwerh, Richard Nii Lante Lawson, Joel Budu, Emmanuel Debrah, Nana Boateng, Wisdom Ofori, Edwin Buabeng-Munkoh, Franklin Adjei, Isaac Kojo Essel Ampomah, Joseph Otoo, Reindorf Borkor, Standylove Birago Mensah, Lucien Mensah, Mark Amoako Marcel , et al. (2 additional authors not shown)

Abstract: We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use a… ▽ More We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set. △ Less

Submitted 1 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: 9 pages paper, Accepted at African NLP workshop @EACL 2021

arXiv:2103.15475 [pdf, ps, other]

NLP for Ghanaian Languages

Authors: Paul Azunre, Salomey Osei, Salomey Addo, Lawrence Asamoah Adu-Gyamfi, Stephen Moore, Bernard Adabankah, Bernard Opoku, Clara Asare-Nyarko, Samuel Nyarko, Cynthia Amoaba, Esther Dansoa Appiah, Felix Akwerh, Richard Nii Lante Lawson, Joel Budu, Emmanuel Debrah, Nana Boateng, Wisdom Ofori, Edwin Buabeng-Munkoh, Franklin Adjei, Isaac Kojo Essel Ampomah, Joseph Otoo, Reindorf Borkor, Standylove Birago Mensah, Lucien Mensah, Mark Amoako Marcel , et al. (2 additional authors not shown)

Abstract: NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then presen… ▽ More NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then present the NLP Ghana organization and outline its aims, scope of work, some of the methods employed and contributions made thus far in the NLP community in Ghana. △ Less

Submitted 1 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: 6 pages paper; Accepted at AfricaNLP @EACL 2021

arXiv:2012.11432 [pdf, other]

Towards the Localisation of Lesions in Diabetic Retinopathy

Authors: Samuel Ofosu Mensah, Bubacarr Bah, Willie Brink

Abstract: Convolutional Neural Networks (CNNs) have successfully been used to classify diabetic retinopathy (DR) fundus images in recent times. However, deeper representations in CNNs may capture higher-level semantics at the expense of spatial resolution. To make predictions usable for ophthalmologists, we use a post-attention technique called Gradient-weighted Class Activation Map** (Grad-CAM) on the pe… ▽ More Convolutional Neural Networks (CNNs) have successfully been used to classify diabetic retinopathy (DR) fundus images in recent times. However, deeper representations in CNNs may capture higher-level semantics at the expense of spatial resolution. To make predictions usable for ophthalmologists, we use a post-attention technique called Gradient-weighted Class Activation Map** (Grad-CAM) on the penultimate layer of deep learning models to produce coarse localisation maps on DR fundus images. This is to help identify discriminative regions in the images, consequently providing evidence for ophthalmologists to make a diagnosis and potentially save lives by early diagnosis. Specifically, this study uses pre-trained weights from four state-of-the-art deep learning models to produce and compare localisation maps of DR fundus images. The models used include VGG16, ResNet50, InceptionV3, and InceptionResNetV2. We find that InceptionV3 achieves the best performance with a test classification accuracy of 96.07%, and localise lesions better and faster than the other models. △ Less

Submitted 2 February, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: 8 pages, 4 figures, used svproc document class, Computing Conference 2021 - London

MSC Class: 68T07 ACM Class: I.4

arXiv:2005.00162 [pdf, other]

Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations

Authors: Kai Sun, Richong Zhang, Samuel Mensah, Yongyi Mao, Xudong Liu

Abstract: The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the… ▽ More The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the task-specific networks for prediction. However, such an approach hinders the model from learning explicit interactions between the two tasks to improve the performance on the individual tasks. As a solution, we design a multi-task learning model which we refer to as recurrent interaction network which allows the learning of interactions dynamically, to effectively model task-specific features for classification. Empirical studies on two real-world datasets confirm the superiority of the proposed model. △ Less

Submitted 16 September, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

Showing 1–11 of 11 results for author: Mensah, S