Showing 1–2 of 2 results for author: Parik, R

Search v0.5.6 released 2020-02-24

arXiv:2210.01637 [pdf, other]

cs.CL

Mining Duplicate Questions of Stack Overflow

Authors: Mihir Kale, Anirudha Rayasam, Radhika Parik, Pranav Dheram

Abstract: There has a been a significant rise in the use of Community Question Answering sites (CQAs) over the last decade owing primarily to their ability to leverage the wisdom of the crowd. Duplicate questions have a crippling effect on the quality of these sites. Tackling duplicate questions is therefore an important step towards improving quality of CQAs. In this regard, we propose two neural network b… ▽ More There has a been a significant rise in the use of Community Question Answering sites (CQAs) over the last decade owing primarily to their ability to leverage the wisdom of the crowd. Duplicate questions have a crippling effect on the quality of these sites. Tackling duplicate questions is therefore an important step towards improving quality of CQAs. In this regard, we propose two neural network based architectures for duplicate question detection on Stack Overflow. We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art. △ Less

Submitted 4 October, 2022; originally announced October 2022.
arXiv:1906.12039 [pdf, ps, other]

cs.CL cs.LG

Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks

Authors: Mihir Kale, Aditya Siddhant, Sreyashi Nag, Radhika Parik, Matthias Grabmair, Anthony Tomasic

Abstract: Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such as language modeling to obtain these embeddings. In contrast, this work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain spec… ▽ More Pre-trained word embeddings are the primary method for transfer learning in several Natural Language Processing (NLP) tasks. Recent works have focused on using unsupervised techniques such as language modeling to obtain these embeddings. In contrast, this work focuses on extracting representations from multiple pre-trained supervised models, which enriches word embeddings with task and domain specific knowledge. Experiments performed in cross-task, cross-domain and cross-lingual settings indicate that such supervised embeddings are helpful, especially in the low-resource setting, but the extent of gains is dependent on the nature of the task and domain. We make our code publicly available. △ Less

Submitted 28 June, 2019; originally announced June 2019.

Comments: Appeared in 2nd Learning from Limited Labeled Data (LLD) Workshop at ICLR 2019

Search v0.5.6 released 2020-02-24