-
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement
Authors:
Samuel Mensah,
Kai Sun,
Nikolaos Aletras
Abstract:
State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective a…
▽ More
State-of-the-art target-oriented opinion word extraction (TOWE) models typically use BERT-based text encoders that operate on the word level, along with graph convolutional networks (GCNs) that incorporate syntactic information extracted from syntax trees. These methods achieve limited gains with GCNs and have difficulty using BERT wordpieces. Meanwhile, BERT wordpieces are known to be effective at representing rare words or words with insufficient context information. To address this issue, this work trades syntax trees for BERT wordpieces by entirely removing the GCN component from the methods' architectures. To enhance TOWE performance, we tackle the issue of aspect representation loss during encoding. Instead of solely utilizing a sentence as the input, we use a sentence-aspect pair. Our relatively simple approach achieves state-of-the-art results on benchmark datasets and should serve as a strong baseline for further research.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Self-training through Classifier Disagreement for Cross-Domain Opinion Target Extraction
Authors:
Kai Sun,
Richong Zhang,
Samuel Mensah,
Nikolaos Aletras,
Yongyi Mao,
Xudong Liu
Abstract:
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduc…
▽ More
Opinion target extraction (OTE) or aspect extraction (AE) is a fundamental task in opinion mining that aims to extract the targets (or aspects) on which opinions have been expressed. Recent work focus on cross-domain OTE, which is typically encountered in real-world scenarios, where the testing and training distributions differ. Most methods use domain adversarial neural networks that aim to reduce the domain gap between the labelled source and unlabelled target domains to improve target domain performance. However, this approach only aligns feature distributions and does not account for class-wise feature alignment, leading to suboptimal results. Semi-supervised learning (SSL) has been explored as a solution, but is limited by the quality of pseudo-labels generated by the model. Inspired by the theoretical foundations in domain adaptation [2], we propose a new SSL approach that opts for selecting target samples whose model output from a domain-specific teacher and student network disagree on the unlabelled target data, in an effort to boost the target domain performance. Extensive experiments on benchmark cross-domain OTE datasets show that this approach is effective and performs consistently well in settings with large domain shifts.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
A Hierarchical N-Gram Framework for Zero-Shot Link Prediction
Authors:
Mingchen Li,
Junfan Chen,
Samuel Mensah,
Nikolaos Aletras,
Xiulong Yang,
Yang Ye
Abstract:
Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embeddi…
▽ More
Due to the incompleteness of knowledge graphs (KGs), zero-shot link prediction (ZSLP) which aims to predict unobserved relations in KGs has attracted recent interest from researchers. A common solution is to use textual features of relations (e.g., surface name or textual descriptions) as auxiliary information to bridge the gap between seen and unseen relations. Current approaches learn an embedding for each word token in the text. These methods lack robustness as they suffer from the out-of-vocabulary (OOV) problem. Meanwhile, models built on character n-grams have the capability of generating expressive representations for OOV words. Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP. Our approach works by first constructing a hierarchical n-gram graph on the surface name to model the organizational structure of n-grams that leads to the surface name. A GramTransformer, based on the Transformer is then presented to model the hierarchical n-gram graph to construct the relation embedding for ZSLP. Experimental results show the proposed HNZSLP achieved state-of-the-art performance on two ZSLP datasets.
△ Less
Submitted 6 January, 2023; v1 submitted 16 April, 2022;
originally announced April 2022.
-
An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction
Authors:
Samuel Mensah,
Kai Sun,
Nikolaos Aletras
Abstract:
Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this informati…
▽ More
Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Investigating the Significance of Bellwether Effect to Improve Software Effort Estimation
Authors:
Solomon Mensah,
Jacky Keung,
Stephen G. MacDonell,
Michael F. Bosu,
Kwabena E. Bennin
Abstract:
Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation acc…
▽ More
Bellwether effect refers to the existence of exemplary projects (called the Bellwether) within a historical dataset to be used for improved prediction performance. Recent studies have shown an implicit assumption of using recently completed projects (referred to as moving window) for improved prediction accuracy. In this paper, we investigate the Bellwether effect on software effort estimation accuracy using moving windows. The existence of the Bellwether was empirically proven based on six postulations. We apply statistical stratification and Markov chain methodology to select the Bellwether moving window. The resulting Bellwether moving window is used to predict the software effort of a new project. Empirical results show that Bellwether effect exist in chronological datasets with a set of exemplary and recently completed projects representing the Bellwether moving window. Result from this study has shown that the use of Bellwether moving window with the Gaussian weighting function significantly improve the prediction accuracy.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Investigating the Significance of the Bellwether Effect to Improve Software Effort Prediction: Further Empirical Study
Authors:
Solomon Mensah,
Jacky Keung,
Stephen G. MacDonell,
Michael Franklin Bosu,
Kwabena Ebo Bennin
Abstract:
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the BMW yields improved accuracy in general, since dif…
▽ More
Context: In addressing how best to estimate how much effort is required to develop software, a recent study found that using exemplary and recently completed projects [forming Bellwether moving windows (BMW)] in software effort prediction (SEP) models leads to relatively improved accuracy. More studies need to be conducted to determine whether the BMW yields improved accuracy in general, since different sizing and aging parameters of the BMW are known to affect accuracy. Objective: To investigate the existence of exemplary projects (Bellwethers) with defined window size and age parameters, and whether their use in SEP improves prediction accuracy. Method: We empirically investigate the moving window assumption based on the theory that the prediction outcome of a future event depends on the outcomes of prior events. Sampling of Bellwethers was undertaken using three introduced Bellwether methods (SSPM, SysSam, and RandSam). The ergodic Markov chain was used to determine the stationarity of the Bell-wethers. Results: Empirical results show that 1) Bellwethers exist in SEP and 2) the BMW has an approximate size of 50 to 80 exemplary projects that should not be more than 2 years old relative to the new projects to be estimated. Conclusion: The study's results add further weight to the recommended use of Bellwethers for improved prediction accuracy in SEP.
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
Contextual Text Embeddings for Twi
Authors:
Paul Azunre,
Salomey Osei,
Salomey Addo,
Lawrence Asamoah Adu-Gyamfi,
Stephen Moore,
Bernard Adabankah,
Bernard Opoku,
Clara Asare-Nyarko,
Samuel Nyarko,
Cynthia Amoaba,
Esther Dansoa Appiah,
Felix Akwerh,
Richard Nii Lante Lawson,
Joel Budu,
Emmanuel Debrah,
Nana Boateng,
Wisdom Ofori,
Edwin Buabeng-Munkoh,
Franklin Adjei,
Isaac Kojo Essel Ampomah,
Joseph Otoo,
Reindorf Borkor,
Standylove Birago Mensah,
Lucien Mensah,
Mark Amoako Marcel
, et al. (2 additional authors not shown)
Abstract:
Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this researc…
▽ More
Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian language. In this paper, we introduce the first of such models for Twi or Akan, the most widely spoken Ghanaian language. The specific contribution of this research work is the development of several pretrained transformer language models for the Akuapem and Asante dialects of Twi, paving the way for advances in application areas such as Named Entity Recognition (NER), Neural Machine Translation (NMT), Sentiment Analysis (SA) and Part-of-Speech (POS) tagging. Specifically, we introduce four different flavours of ABENA -- A BERT model Now in Akan that is fine-tuned on a set of Akan corpora, and BAKO - BERT with Akan Knowledge only, which is trained from scratch. We open-source the model through the Hugging Face model hub and demonstrate its use via a simple sentiment classification example.
△ Less
Submitted 31 March, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
English-Twi Parallel Corpus for Machine Translation
Authors:
Paul Azunre,
Salomey Osei,
Salomey Addo,
Lawrence Asamoah Adu-Gyamfi,
Stephen Moore,
Bernard Adabankah,
Bernard Opoku,
Clara Asare-Nyarko,
Samuel Nyarko,
Cynthia Amoaba,
Esther Dansoa Appiah,
Felix Akwerh,
Richard Nii Lante Lawson,
Joel Budu,
Emmanuel Debrah,
Nana Boateng,
Wisdom Ofori,
Edwin Buabeng-Munkoh,
Franklin Adjei,
Isaac Kojo Essel Ampomah,
Joseph Otoo,
Reindorf Borkor,
Standylove Birago Mensah,
Lucien Mensah,
Mark Amoako Marcel
, et al. (2 additional authors not shown)
Abstract:
We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use a…
▽ More
We present a parallel machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. We used a transformer-based translator to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers to eliminate any occurrence of translationese. In addition, 697 higher quality crowd-sourced sentences are provided for use as an evaluation set for downstream Natural Language Processing (NLP) tasks. The typical use case for the larger human-verified dataset is for further training of machine translation models in Akuapem Twi. The higher quality 697 crowd-sourced dataset is recommended as a testing dataset for machine translation of English to Twi and Twi to English models. Furthermore, the Twi part of the crowd-sourced data may also be used for other tasks, such as representation learning, classification, etc. We fine-tune the transformer translation model on the training corpus and report benchmarks on the crowd-sourced test set.
△ Less
Submitted 1 April, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
NLP for Ghanaian Languages
Authors:
Paul Azunre,
Salomey Osei,
Salomey Addo,
Lawrence Asamoah Adu-Gyamfi,
Stephen Moore,
Bernard Adabankah,
Bernard Opoku,
Clara Asare-Nyarko,
Samuel Nyarko,
Cynthia Amoaba,
Esther Dansoa Appiah,
Felix Akwerh,
Richard Nii Lante Lawson,
Joel Budu,
Emmanuel Debrah,
Nana Boateng,
Wisdom Ofori,
Edwin Buabeng-Munkoh,
Franklin Adjei,
Isaac Kojo Essel Ampomah,
Joseph Otoo,
Reindorf Borkor,
Standylove Birago Mensah,
Lucien Mensah,
Mark Amoako Marcel
, et al. (2 additional authors not shown)
Abstract:
NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then presen…
▽ More
NLP Ghana is an open-source non-profit organization aiming to advance the development and adoption of state-of-the-art NLP techniques and digital language tools to Ghanaian languages and problems. In this paper, we first present the motivation and necessity for the efforts of the organization; by introducing some popular Ghanaian languages while presenting the state of NLP in Ghana. We then present the NLP Ghana organization and outline its aims, scope of work, some of the methods employed and contributions made thus far in the NLP community in Ghana.
△ Less
Submitted 1 April, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
Towards the Localisation of Lesions in Diabetic Retinopathy
Authors:
Samuel Ofosu Mensah,
Bubacarr Bah,
Willie Brink
Abstract:
Convolutional Neural Networks (CNNs) have successfully been used to classify diabetic retinopathy (DR) fundus images in recent times. However, deeper representations in CNNs may capture higher-level semantics at the expense of spatial resolution. To make predictions usable for ophthalmologists, we use a post-attention technique called Gradient-weighted Class Activation Map** (Grad-CAM) on the pe…
▽ More
Convolutional Neural Networks (CNNs) have successfully been used to classify diabetic retinopathy (DR) fundus images in recent times. However, deeper representations in CNNs may capture higher-level semantics at the expense of spatial resolution. To make predictions usable for ophthalmologists, we use a post-attention technique called Gradient-weighted Class Activation Map** (Grad-CAM) on the penultimate layer of deep learning models to produce coarse localisation maps on DR fundus images. This is to help identify discriminative regions in the images, consequently providing evidence for ophthalmologists to make a diagnosis and potentially save lives by early diagnosis. Specifically, this study uses pre-trained weights from four state-of-the-art deep learning models to produce and compare localisation maps of DR fundus images. The models used include VGG16, ResNet50, InceptionV3, and InceptionResNetV2. We find that InceptionV3 achieves the best performance with a test classification accuracy of 96.07%, and localise lesions better and faster than the other models.
△ Less
Submitted 2 February, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Recurrent Interaction Network for Jointly Extracting Entities and Classifying Relations
Authors:
Kai Sun,
Richong Zhang,
Samuel Mensah,
Yongyi Mao,
Xudong Liu
Abstract:
The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the…
▽ More
The idea of using multi-task learning approaches to address the joint extraction of entity and relation is motivated by the relatedness between the entity recognition task and the relation classification task. Existing methods using multi-task learning techniques to address the problem learn interactions among the two tasks through a shared network, where the shared information is passed into the task-specific networks for prediction. However, such an approach hinders the model from learning explicit interactions between the two tasks to improve the performance on the individual tasks. As a solution, we design a multi-task learning model which we refer to as recurrent interaction network which allows the learning of interactions dynamically, to effectively model task-specific features for classification. Empirical studies on two real-world datasets confirm the superiority of the proposed model.
△ Less
Submitted 16 September, 2020; v1 submitted 30 April, 2020;
originally announced May 2020.