Skip to main content

Showing 1–6 of 6 results for author: Adewumi, T P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2105.03280  [pdf, other

    cs.CL cs.LG

    Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms

    Authors: Tosin P. Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaidou, Foteini Liwicki, Marcus Liwicki

    Abstract: We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors' knowledge,… ▽ More

    Submitted 23 April, 2022; v1 submitted 25 April, 2021; originally announced May 2021.

    Comments: Accepted at the International Conference on Language Resources and Evaluation (LREC) 2022

  2. arXiv:2011.07605  [pdf, ps, other

    cs.CL cs.LG

    The Challenge of Diacritics in Yoruba Embeddings

    Authors: Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki

    Abstract: The major contributions of this work include the empirical establishment of a better performance for Yoruba embeddings from undiacritized (normalized) dataset and provision of new analogy sets for evaluation. The Yoruba language, being a tonal language, utilizes diacritics (tonal marks) in written form. We show that this affects embedding performance by creating embeddings from exactly the same Wi… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: Presented at NeurIPS 2020 Workshop on Machine Learning for the Develo** World

  3. arXiv:2011.03281  [pdf, other

    cs.CL cs.LG

    Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora

    Authors: Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki

    Abstract: In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles. We evaluate embeddings based on two Swedish corpora: The G… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: Presented at the Eighth Swedish Language Technology Conference (SLTC)

  4. arXiv:2007.16007  [pdf, other

    cs.CL cs.LG

    Exploring Swedish & English fastText Embeddings for NER with the Transformer

    Authors: Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki

    Abstract: In this paper, our main contributions are that embeddings from relatively smaller corpora can outperform ones from larger corpora and we make the new Swedish analogy test set publicly available. To achieve a good network performance in natural language processing (NLP) downstream tasks, several factors play important roles: dataset size, the right hyper-parameters, and well-trained embeddings. We… ▽ More

    Submitted 17 April, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: 11 pages, 2 figures, 8 tables; added new references and clarification about other possible models for NER

  5. arXiv:2003.11645  [pdf, other

    cs.CL cs.LG stat.ML

    Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks

    Authors: Tosin P. Adewumi, Foteini Liwicki, Marcus Liwicki

    Abstract: Word2Vec is a prominent model for natural language processing (NLP) tasks. Similar inspiration is found in distributed embeddings for new state-of-the-art (SotA) deep neural networks. However, wrong combination of hyper-parameters can produce poor quality vectors. The objective of this work is to empirically show optimal combination of hyper-parameters exists and evaluate various combinations. We… ▽ More

    Submitted 17 April, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: 8 pages, 7 figures, 6 tables; added new references based on new input in the result section about CI

  6. Inner For-Loop for Speeding Up Blockchain Mining

    Authors: Tosin P. Adewumi, Marcus Liwicki

    Abstract: In this paper, the authors propose to increase the efficiency of blockchain mining by using a population-based approach. Blockchain relies on solving difficult mathematical problems as proof-of-work within a network before blocks are added to the chain. Brute force approach, advocated by some as the fastest algorithm for solving partial hash collisions and implemented in Bitcoin blockchain, implie… ▽ More

    Submitted 26 February, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: 6 pages, 1 table and 2 figures

    Journal ref: Open Computer Science, 10(1), pp. 42-47 (2020)