Skip to main content

Showing 1–5 of 5 results for author: Garain, A

.
  1. arXiv:2009.01195  [pdf, other

    cs.CL

    Garain at SemEval-2020 Task 12: Sequence based Deep Learning for Categorizing Offensive Language in Social Media

    Authors: Avishek Garain

    Abstract: SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media (Zampieri et al., 2020). The task was subdivided into multiple languages and datasets were provided for each one. The task was further divided into three sub-tasks: offensive language identification, automatic categorization of offense types, and offense target identification. I have participated i… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: Preprint for SemEval-2020 Task 12 System description paper, 8 pages, 3 figures

  2. arXiv:2007.12561  [pdf, other

    cs.CL

    JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed data using Grid Search Cross Validation

    Authors: Avishek Garain, Sainik Kumar Mahata, Dipankar Das

    Abstract: Code-mixing is a phenomenon which arises mainly in multilingual societies. Multilingual people, who are well versed in their native languages and also English speakers, tend to code-mix using English-based phonetic ty** and the insertion of anglicisms in their main language. This linguistic phenomenon poses a great challenge to conventional NLP domains such as Sentiment Analysis, Machine Transla… ▽ More

    Submitted 2 September, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

  3. arXiv:1908.01349  [pdf, other

    cs.CL

    JUMT at WMT2019 News Translation Task: A Hybrid approach to Machine Translation for Lithuanian to English

    Authors: Sainik Kumar Mahata, Avishek Garain, Adityar Rayala, Dipankar Das, Sivaji Bandyopadhyay

    Abstract: In the current work, we present a description of the system submitted to WMT 2019 News Translation Shared task. The system was created to translate news text from Lithuanian to English. To accomplish the given task, our system used a Word Embedding based Neural Machine Translation model to post edit the outputs generated by a Statistical Machine Translation model. The current paper documents the a… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1908.00323

  4. arXiv:1908.00321  [pdf, other

    cs.CL

    Sentiment Analysis at SEPLN (TASS)-2019: Sentiment Analysis at Tweet level using Deep Learning

    Authors: Avishek Garain, Sainik Kumar Mahata

    Abstract: This paper describes the system submitted to "Sentiment Analysis at SEPLN (TASS)-2019" shared task. The task includes sentiment analysis of Spanish tweets, where the tweets are in different dialects spoken in Spain, Peru, Costa Rica, Uruguay and Mexico. The tweets are short (up to 240 characters) and the language is informal, i.e., it contains misspellings, emojis, onomatopeias etc. Sentiment anal… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

  5. arXiv:1907.13356  [pdf, ps, other

    cs.CL

    Normalyzing Numeronyms -- A NLP approach

    Authors: Avishek Garain, Sainik Kumar Mahata, Subhabrata Dutta

    Abstract: This paper presents a method to apply Natural Language Processing for normalizing numeronyms to make them understandable by humans. We approach the problem through a two-step mechanism. We make use of the state of the art Levenshtein distance of words. We then apply Cosine Similarity for selection of the normalized text and reach greater accuracy in solving the problem. Our approach garners accura… ▽ More

    Submitted 12 November, 2019; v1 submitted 31 July, 2019; originally announced July 2019.