Skip to main content

Showing 1–2 of 2 results for author: Duvenhage, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:1911.07555  [pdf, ps, other

    cs.CL

    Short Text Language Identification for Under Resourced Languages

    Authors: Bernardt Duvenhage

    Abstract: The paper presents a hierarchical naive Bayesian and lexicon based classifier for short text language identification (LID) useful for under resourced languages. The algorithm is evaluated on short pieces of text for the 11 official South African languages some of which are similar languages. The algorithm is compared to recent approaches using test sets from previous works on South African languag… ▽ More

    Submitted 21 November, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: Presented at NeurIPS 2019 Workshop on Machine Learning for the Develo** World

    MSC Class: 68T50

  2. arXiv:1711.00247  [pdf, other

    cs.CL

    Improved Text Language Identification for the South African Languages

    Authors: Bernardt Duvenhage, Mfundo Ntini, Phala Ramonyai

    Abstract: Virtual assistants and text chatbots have recently been gaining popularity. Given the short message nature of text-based chat interactions, the language identification systems of these bots might only have 15 or 20 characters to make a prediction. However, accurate text language identification is important, especially in the early stages of many multilingual natural language processing pipelines.… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

    Comments: Accepted to appear in the proceedings of The 28th Annual Symposium of the Pattern Recognition Association of South Africa, 2017