Skip to main content

Showing 1–7 of 7 results for author: Ansari, M Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2107.01202  [pdf

    cs.CL cs.LG

    Language Identification of Hindi-English tweets using code-mixed BERT

    Authors: Mohd Zeeshan Ansari, M M Sufyan Beg, Tanvir Ahmad, Mohd Jazib Khan, Ghazali Wasim

    Abstract: Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, th… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  2. arXiv:2106.15105  [pdf

    cs.CL

    Language Lexicons for Hindi-English Multilingual Text Processing

    Authors: Mohd Zeeshan Ansari, Tanvir Ahmad, Noaima Bari

    Abstract: Language Identification in textual documents is the process of automatically detecting the language contained in a document based on its content. The present Language Identification techniques presume that a document contains text in one of the fixed set of languages, however, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible lan… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

  3. arXiv:2106.15102  [pdf

    cs.CL

    A Simple and Efficient Probabilistic Language model for Code-Mixed Text

    Authors: M Zeeshan Ansari, Tanvir Ahmad, M M Sufyan Beg, Asma Ikram

    Abstract: The conventional natural language processing approaches are not accustomed to the social media text due to colloquial discourse and non-homogeneous characteristics. Significantly, the language identification in a multilingual document is ascertained to be a preceding subtask in several information extraction applications such as information retrieval, named entity recognition, relation extraction,… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

  4. arXiv:2007.10604  [pdf

    cs.SI cs.CY cs.LG

    Inferring Political Preferences from Twitter

    Authors: Mohd Zeeshan Ansari, Areesha Fatima Siddiqui, Mohammad Anas

    Abstract: Sentiment analysis is the task of automatic analysis of opinions and emotions of users towards an entity or some aspect of that entity. Political Sentiment Analysis of social media helps the political strategists to scrutinize the performance of a party or candidate and improvise their weaknesses far before the actual elections. During the time of elections, the social networks get flooded with bl… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: International Conference on Emerging Technologies in Data Mining and Information Security IEMIS 2020

  5. arXiv:2007.05727  [pdf

    cs.CL cs.LG

    Feature Selection on Noisy Twitter Short Text Messages for Language Identification

    Authors: Mohd Zeeshan Ansari, Tanvir Ahmad, Ana Fatima

    Abstract: The task of written language identification involves typically the detection of the languages present in a sample of text. Moreover, a sequence of text may not belong to a single inherent language but also may be mixture of text written in multiple languages. This kind of text is generated in large volumes from social media platforms due to its flexible and user friendly environment. Such text con… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

    Journal ref: International Journal of Recent Technology and Engineering, Volume-8, Issue-4, Nov 2019

  6. arXiv:1901.07867  [pdf

    cs.CL

    Context based Analysis of Lexical Semantics for Hindi Language

    Authors: Mohd Zeeshan Ansari, Lubna Khan

    Abstract: A word having multiple senses in a text introduces the lexical semantic task to find out which particular sense is appropriate for the given context. One such task is Word sense disambiguation which refers to the identification of the most appropriate meaning of the polysemous word in a given context using computational algorithms. The language processing research in Hindi, the official language o… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Comments: Accepted in NGCT-2018

  7. arXiv:1810.03430  [pdf

    cs.IR cs.CL cs.LG

    Cross Script Hindi English NER Corpus from Wikipedia

    Authors: Mohd Zeeshan Ansari, Tanvir Ahmad, Md Arshad Ali

    Abstract: The text generated on social media platforms is essentially a mixed lingual text. The mixing of language in any form produces considerable amount of difficulty in language processing systems. Moreover, the advancements in language processing research depends upon the availability of standard corpora. The development of mixed lingual Indian Named Entity Recognition (NER) systems are facing obstacle… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI-2018)