Skip to main content

Showing 1–14 of 14 results for author: Awekar, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.17292  [pdf, ps, other

    cs.CL

    Effect of dimensionality change on the bias of word embeddings

    Authors: Rohit Raj Rai, Amit Awekar

    Abstract: Word embedding methods (WEMs) are extensively used for representing text data. The dimensionality of these embeddings varies across various tasks and implementations. The effect of dimensionality change on the accuracy of the downstream task is a well-explored question. However, how the dimensionality change affects the bias of word embeddings needs to be investigated. Using the English Wikipedia… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in the Young Research Symposium Track of ACM CODS-COMADS 2024. 2 pages

  2. arXiv:2311.12298  [pdf, other

    cs.CL cs.AI

    Noise in Relation Classification Dataset TACRED: Characterization and Reduction

    Authors: Akshay Parekh, Ashish Anand, Amit Awekar

    Abstract: The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED sho… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Work in Progress

  3. arXiv:2112.13320  [pdf, other

    cs.CL cs.AI

    Budget Sensitive Reannotation of Noisy Relation Classification Data Using Label Hierarchy

    Authors: Akshay Parekh, Ashish Anand, Amit Awekar

    Abstract: Large crowd-sourced datasets are often noisy and relation classification (RC) datasets are no exception. Reannotating the entire dataset is one probable solution however it is not always viable due to time and budget constraints. This paper addresses the problem of efficient reannotation of a large noisy dataset for the RC. Our goal is to catch more annotation errors in the dataset while reannotat… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

  4. arXiv:2104.08433  [pdf, other

    cs.CL cs.IR

    Are Word Embedding Methods Stable and Should We Care About It?

    Authors: Angana Borah, Manash Pratim Barman, Amit Awekar

    Abstract: A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate dense vector representation for each word in the given text data. The central idea of this paper is to explore the stability measurement of WEMs using intrinsic ev… ▽ More

    Submitted 11 June, 2024; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted to ACM Hypertext 2021

  5. arXiv:1909.06249  [pdf, other

    cs.CL

    Taxonomical hierarchy of canonicalized relations from multiple Knowledge Bases

    Authors: Akshay Parekh, Ashish Anand, Amit Awekar

    Abstract: This work addresses two important questions pertinent to Relation Extraction (RE). First, what are all possible relations that could exist between any two given entity types? Second, how do we define an unambiguous taxonomical (is-a) hierarchy among the identified relations? To address the first question, we use three resources Wikipedia Infobox, Wikidata, and DBpedia. This study focuses on relati… ▽ More

    Submitted 12 November, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: Accepted at CoDS-COMAD 2020

  6. arXiv:1907.07818  [pdf, other

    cs.IR cs.CL

    Decoding the Style and Bias of Song Lyrics

    Authors: Manash Pratim Barman, Amit Awekar, Sambhav Kothari

    Abstract: The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocab… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Accepted for ACM SIGIR 2019

  7. arXiv:1904.13178  [pdf, other

    cs.CL

    Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage

    Authors: Abhishek Abhishek, Sanya Bathla Taneja, Garima Malik, Ashish Anand, Amit Awekar

    Abstract: Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions to a large set of types spanning diverse domains such as biomedical, finance and sports. We observe that when the type set spans several domains, detection of entity mention becomes a limitation for supervised learning models. The primary reason being lack of dataset where entity boundaries are properly… ▽ More

    Submitted 30 April, 2019; originally announced April 2019.

    Comments: Camera ready version, AKBC 2019. Code and data available at https://github.com/abhipec/HAnDS

  8. arXiv:1901.05227  [pdf, other

    cs.IR

    It's Only Words And Words Are All I Have

    Authors: Manash Pratim Barman, Kavish Dahekar, Abhinav Anshuman, Amit Awekar

    Abstract: The central idea of this paper is to demonstrate the strength of lyrics for music mining and natural language processing (NLP) tasks using the distributed representation paradigm. For music mining, we address two prediction tasks for songs: genre and popularity. Existing works for both these problems have two major bottlenecks. First, they represent lyrics using handcrafted features that require i… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

    Comments: Accepted for ECIR 2019

  9. arXiv:1810.08782  [pdf, other

    cs.CL cs.AI

    Collective Learning From Diverse Datasets for Entity Ty** in the Wild

    Authors: Abhishek Abhishek, Amar Prakash Azad, Balaji Ganesan, Ashish Anand, Amit Awekar

    Abstract: Entity ty** (ET) is the problem of assigning labels to given entity mentions in a sentence. Existing works for ET require knowledge about the domain and target label set for a given test instance. ET in the absence of such knowledge is a novel problem that we address as ET in the wild. We hypothesize that the solution to this problem is to build supervised models that generalize better on the ET… ▽ More

    Submitted 16 September, 2019; v1 submitted 20 October, 2018; originally announced October 2018.

    Comments: Accepted at EYRE'19 Workshop, CIKM 2019

  10. arXiv:1801.06482  [pdf, other

    cs.IR cs.CL cs.SI

    Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms

    Authors: Sweta Agrawal, Amit Awekar

    Abstract: Harassment by cyberbullies is a significant phenomenon on the social media. Existing works for cyberbullying detection have at least one of the following three bottlenecks. First, they target only one particular social media platform (SMP). Second, they address just one topic of cyberbullying. Third, they rely on carefully handcrafted features of the data. We show that deep learning based models c… ▽ More

    Submitted 19 January, 2018; originally announced January 2018.

    Comments: Accepted for ECIR 2018

  11. arXiv:1702.06709  [pdf, other

    cs.CL

    Fine-Grained Entity Type Classification by Jointly Learning Representations and Label Embeddings

    Authors: Abhishek, Ashish Anand, Amit Awekar

    Abstract: Fine-grained entity type classification (FETC) is the task of classifying an entity mention to a broad set of types. Distant supervision paradigm is extensively used to generate training data for this task. However, generated training data assigns same set of labels to every mention of an entity without considering its local context. Existing FETC systems have two major drawbacks: assuming trainin… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

    Comments: 11 pages, 5 figures, accepted at EACL 2017 conference

  12. arXiv:1701.09049  [pdf, other

    cs.DB cs.IR

    Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets

    Authors: Panthadeep Bhattacharjee, Amit Awekar

    Abstract: Incremental data mining algorithms process frequent updates to dynamic datasets efficiently by avoiding redundant computation. Existing incremental extension to shared nearest neighbor density based clustering (SNND) algorithm cannot handle deletions to dataset and handles insertions only one point at a time. We present an incremental algorithm to overcome both these bottlenecks by efficiently ide… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

    Comments: 6 pages, Accepted at ECIR 2017

  13. arXiv:1701.04600  [pdf, ps, other

    cs.LG cs.IR

    Faster K-Means Cluster Estimation

    Authors: Siddhesh Khandelwal, Amit Awekar

    Abstract: There has been considerable work on improving popular clustering algorithm `K-means' in terms of mean squared error (MSE) and speed, both. However, most of the k-means variants tend to compute distance of each data point to each cluster centroid for every iteration. We propose a fast heuristic to overcome this bottleneck with only marginal increase in MSE. We observe that across all iterations of… ▽ More

    Submitted 17 January, 2017; originally announced January 2017.

    Comments: 6 pages, Accepted at ECIR 2017

  14. arXiv:1701.02617  [pdf, other

    cs.IR cs.DL

    On Low Overlap Among Search Results of Academic Search Engines

    Authors: Anasua Mitra, Amit Awekar

    Abstract: Number of published scholarly articles is growing exponentially. To tackle this information overload, researchers are increasingly depending on niche academic search engines. Recent works have shown that two major general web search engines: Google and Bing, have high level of agreement in their top search results. In contrast, we show that various academic search engines have low degree of agreem… ▽ More

    Submitted 10 January, 2017; originally announced January 2017.

    Comments: 2 pages, submitted to ACM WWW Conference 2017

    ACM Class: H.3.3