Skip to main content

Showing 1–14 of 14 results for author: Rudra, K

.
  1. arXiv:2311.15426  [pdf, other

    cs.IR

    Data Augmentation for Sample Efficient and Robust Document Ranking

    Authors: Abhijit Anand, Jurek Leonhardt, Jaspreet Singh, Koustav Rudra, Avishek Anand

    Abstract: Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine-tuning. In this paper, we propose data-augmentation methods for effective and robust ranking performance. One of the key benefits of using data augmenta… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  2. arXiv:2311.01263  [pdf, other

    cs.IR

    Efficient Neural Ranking using Forward Indexes and Lightweight Encoders

    Authors: Jurek Leonhardt, Henrik Müller, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand

    Abstract: Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transformer-based language models, which are notoriously inefficient in terms of resources and latency. We propose Fast-Forward indexes -- vector forward indexes which exploit the semantic matching capabilities of dual-encoder models for efficient and effective re-ranking. Our framework enables re-ranking… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted at ACM TOIS. arXiv admin note: text overlap with arXiv:2110.06051

  3. arXiv:2304.11485  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    Understanding Lexical Biases when Identifying Gang-related Social Media Communications

    Authors: Dhiraj Murthy, Constantine Caramanis, Koustav Rudra

    Abstract: Individuals involved in gang-related activity use mainstream social media including Facebook and Twitter to express taunts and threats as well as grief and memorializing. However, identifying the impact of gang-related activity in order to serve community member needs through social media sources has a unique set of challenges. This includes the difficulty of ethically identifying training data of… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    ACM Class: J.4; K.4.2; I.2.7

  4. arXiv:2302.06975  [pdf, other

    cs.AI

    A Review of the Role of Causality in Develo** Trustworthy AI Systems

    Authors: Niloy Ganguly, Dren Fazlija, Maryam Badar, Marco Fisichella, Sandipan Sikdar, Johanna Schrader, Jonas Wallat, Koustav Rudra, Manolis Koubarakis, Gourab K. Patro, Wadhah Zai El Amri, Wolfgang Nejdl

    Abstract: State-of-the-art AI models largely lack an understanding of the cause-effect relationship that governs human understanding of the real world. Consequently, these models do not generalize to unseen data, often produce unfair results, and are difficult to interpret. This has led to efforts to improve the trustworthiness aspects of AI models. Recently, causal modeling and inference methods have emerg… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 55 pages, 8 figures. Under review

  5. arXiv:2207.03153  [pdf, other

    cs.IR

    Supervised Contrastive Learning Approach for Contextual Ranking

    Authors: Abhijit Anand, Jurek Leonhardt, Koustav Rudra, Avishek Anand

    Abstract: Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine tuning. This paper proposes a simple yet effective method to improve ranking performance on smaller datasets using supervised contrastive learning for t… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

  6. MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs

    Authors: Rajdeep Mukherjee, Uppada Vishnu, Hari Chandana Peruri, Sourangshu Bhattacharya, Koustav Rudra, Pawan Goyal, Niloy Ganguly

    Abstract: Occurrences of catastrophes such as natural or man-made disasters trigger the spread of rumours over social media at a rapid pace. Presenting a trustworthy and summarized account of the unfolding event in near real-time to the consumers of such potentially unreliable information thus becomes an important task. In this work, we propose MTLTS, the first end-to-end solution for the task that jointly… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted as a Full Paper at WSDM 2022; 9 pages; Codes: https://github.com/rajdeep345/MTLTS

    ACM Class: H.3.3

  7. FaxPlainAC: A Fact-Checking Tool Based on EXPLAINable Models with HumAn Correction in the Loop

    Authors: Zijian Zhang, Koustav Rudra, Avishek Anand

    Abstract: Fact-checking on the Web has become the main mechanism through which we detect the credibility of the news or information. Existing fact-checkers verify the authenticity of the information (support or refute the claim) based on secondary sources of information. However, existing approaches do not consider the problem of model updates due to constantly increasing training data due to user feedback.… ▽ More

    Submitted 12 September, 2021; originally announced October 2021.

    Comments: 5 pages, 4 figures, accepted as a DEMO paper in CIKM 2021

    ACM Class: I.2.m

    Journal ref: CIKM 2021

  8. Efficient Neural Ranking using Forward Indexes

    Authors: Jurek Leonhardt, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand

    Abstract: Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper, we propose the Fast-Forward index -- a simple vector forward index that facilitates ranking documents using interpolation of lexical and semantic scores -- as a… ▽ More

    Submitted 4 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Full paper at TheWebConf 2022

  9. arXiv:2106.15876  [pdf, other

    cs.CL cs.IR

    Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents

    Authors: Paheli Bhattacharya, Soham Poddar, Koustav Rudra, Kripabandhu Ghosh, Saptarshi Ghosh

    Abstract: Automatic summarization of legal case documents is an important and practical challenge. Apart from many domain-independent text summarization algorithms that can be used for this purpose, several algorithms have been developed specifically for summarizing legal case documents. However, most of the existing algorithms do not systematically incorporate domain knowledge that specifies what informati… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted at the 18th International Conference on Artificial Intelligence and Law (ICAIL) 2021

  10. arXiv:2106.12460  [pdf, other

    cs.IR

    Extractive Explanations for Interpretable Text Ranking

    Authors: Jurek Leonhardt, Koustav Rudra, Avishek Anand

    Abstract: Neural document ranking models perform impressively well due to superior language understanding gained from pre-training tasks. However, due to their complexity and large number of parameters, these (typically transformer-based) models are often non-interpretable in that ranking decisions can not be clearly attributed to specific parts of the input documents. In this paper we propose ranking mod… ▽ More

    Submitted 1 December, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted to ACM TOIS

  11. arXiv:2103.16669  [pdf, other

    cs.IR

    An In-depth Analysis of Passage-Level Label Transfer for Contextual Document Ranking

    Authors: Koustav Rudra, Zeon Trevor Fernando, Avishek Anand

    Abstract: Pre-trained contextual language models such as BERT, GPT, and XLnet work quite well for document retrieval tasks. Such models are fine-tuned based on the query-document/query-passage level relevance labels to capture the ranking signals. However, the documents are longer than the passages and such document ranking models suffer from the token limitation (512) of BERT. Researchers proposed ranking… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: Paper is about the performance analysis of contextual ranking strategies in an ad-hoc document retrieval

    ACM Class: H.3.3

  12. arXiv:2101.04109  [pdf, other

    cs.CL cs.AI cs.LG

    Explain and Predict, and then Predict Again

    Authors: Zijian Zhang, Koustav Rudra, Avishek Anand

    Abstract: A desirable property of learning systems is to be both effective and interpretable. Towards this goal, recent models have been proposed that first generate an extractive explanation from the input text and then generate a prediction on just the explanation called explain-then-predict models. These models primarily consider the task input as a supervision signal in learning an extractive explanatio… ▽ More

    Submitted 4 February, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted in the WSDM 2021

    ACM Class: I.2.m; I.2.7

  13. Stance Detection in Web and Social Media: A Comparative Study

    Authors: Shalmoli Ghosh, Prajwal Singhania, Siddharth Singh, Koustav Rudra, Saptarshi Ghosh

    Abstract: Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literature. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work,… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of Conference and Labs of the Evaluation Forum (CLEF) 2019; Lecture Notes in Computer Science, vol 11696, pp. 75-87

  14. arXiv:1610.01561  [pdf, ps, other

    cs.SI cs.CL

    Summarizing Situational and Topical Information During Crises

    Authors: Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal, Muhammad Imran, Prasenjit Mitra

    Abstract: The use of microblogging platforms such as Twitter during crises has become widespread. More importantly, information disseminated by affected people contains useful information like reports of missing and found people, requests for urgent needs etc. For rapid crisis response, humanitarian organizations look for situational awareness information to understand and assess the severity of the crisis.… ▽ More

    Submitted 5 October, 2016; originally announced October 2016.

    Comments: 7 pages, 9 figures, Accepted in The 4th International Workshop on Social Web for Disaster Management (SWDM'16) will be co-located with CIKM 2016