Search | arXiv e-print repository

Self-Regulated Data-Free Knowledge Amalgamation for Text Classification

Authors: Prashanth Vijayaraghavan, Hongzhi Wang, Luyao Shi, Tyler Baldwin, David Beymer, Ehsan Degan

Abstract: Recently, there has been a growing availability of pre-trained text models on various model repositories. These models greatly reduce the cost of training new models from scratch as they can be fine-tuned for specific tasks or trained on large datasets. However, these datasets may not be publicly accessible due to the privacy, security, or intellectual property issues. In this paper, we aim to dev… ▽ More Recently, there has been a growing availability of pre-trained text models on various model repositories. These models greatly reduce the cost of training new models from scratch as they can be fine-tuned for specific tasks or trained on large datasets. However, these datasets may not be publicly accessible due to the privacy, security, or intellectual property issues. In this paper, we aim to develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. Hence, we investigate Data-Free Knowledge Amalgamation (DFKA), a knowledge-transfer task that combines insights from multiple pre-trained teacher models and transfers them effectively to a compact student network. To accomplish this, we propose STRATANET, a modeling framework comprising: (a) a steerable data generator that produces text data tailored to each teacher and (b) an amalgamation module that implements a self-regulative strategy using confidence estimates from the teachers' different layers to selectively integrate their knowledge and train a versatile student. We evaluate our method on three benchmark text classification datasets with varying labels or domains. Empirically, we demonstrate that the student model learned using our STRATANET outperforms several baselines significantly under data-driven and data-free constraints. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 Figures, Proceedings of NAACL 2024

arXiv:2406.04379 [pdf, other]

VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation

Authors: Prashanth Vijayaraghavan, Luyao Shi, Stefano Ambrogio, Charles Mackin, Apoorva Nitsure, David Beymer, Ehsan Degan

Abstract: With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant progress has been made in enhancing LLMs for popular programming languages, there exists a notable gap in comprehensive evaluation frameworks tailored for Hardware Description Languages (HDLs), particul… ▽ More With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant progress has been made in enhancing LLMs for popular programming languages, there exists a notable gap in comprehensive evaluation frameworks tailored for Hardware Description Languages (HDLs), particularly VHDL. This paper addresses this gap by introducing a comprehensive evaluation framework designed specifically for assessing LLM performance in VHDL code generation task. We construct a dataset for evaluating LLMs on VHDL code generation task. This dataset is constructed by translating a collection of Verilog evaluation problems to VHDL and aggregating publicly available VHDL problems, resulting in a total of 202 problems. To assess the functional correctness of the generated VHDL code, we utilize a curated set of self-verifying testbenches specifically designed for those aggregated VHDL problem set. We conduct an initial evaluation of different LLMs and their variants, including zero-shot code generation, in-context learning (ICL), and Parameter-efficient fine-tuning (PEFT) methods. Our findings underscore the considerable challenges faced by existing LLMs in VHDL code generation, revealing significant scope for improvement. This study emphasizes the necessity of supervised fine-tuning code generation models specifically for VHDL, offering potential benefits to VHDL designers seeking efficient code generation solutions. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 Figures, LAD'24

arXiv:2403.19995 [pdf, other]

Development of Compositionality and Generalization through Interactive Learning of Language and Action of Robots

Authors: Prasanna Vijayaraghavan, Jeffrey Frederic Queisser, Sergio Verduzco Flores, Jun Tani

Abstract: Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as compositionality. One of the fundamental questions in robotics concerns this characteristic. "How can linguistic compositionality be developed concomitantly with sensorimotor skills through asso… ▽ More Humans excel at applying learned behavior to unlearned situations. A crucial component of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as compositionality. One of the fundamental questions in robotics concerns this characteristic. "How can linguistic compositionality be developed concomitantly with sensorimotor skills through associative learning, particularly when individuals only learn partial linguistic compositions and their corresponding sensorimotor patterns?" To address this question, we propose a brain-inspired neural network model that integrates vision, proprioception, and language into a framework of predictive coding and active inference, based on the free-energy principle. The effectiveness and capabilities of this model were assessed through various simulation experiments conducted with a robot arm. Our results show that generalization in learning to unlearned verb-noun compositions, is significantly enhanced when training variations of task composition are increased. We attribute this to self-organized compositional structures in linguistic latent state space being influenced significantly by sensorimotor learning. Ablation studies show that visual attention and working memory are essential to accurately generate visuo-motor sequences to achieve linguistically represented goals. These insights advance our understanding of mechanisms underlying development of compositionality through interactions of linguistic and sensorimotor experience. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 59 pages, 6 figures, 10 supplementary figures

MSC Class: 68T35; 68T40 ACM Class: I.2.9

arXiv:2310.16753 [pdf, other]

PROMINET: Prototype-based Multi-View Network for Interpretable Email Response Prediction

Authors: Yuqing Wang, Prashanth Vijayaraghavan, Ehsan Degan

Abstract: Email is a widely used tool for business communication, and email marketing has emerged as a cost-effective strategy for enterprises. While previous studies have examined factors affecting email marketing performance, limited research has focused on understanding email response behavior by considering email content and metadata. This study proposes a Prototype-based Multi-view Network (PROMINET) t… ▽ More Email is a widely used tool for business communication, and email marketing has emerged as a cost-effective strategy for enterprises. While previous studies have examined factors affecting email marketing performance, limited research has focused on understanding email response behavior by considering email content and metadata. This study proposes a Prototype-based Multi-view Network (PROMINET) that incorporates semantic and structural information from email data. By utilizing prototype learning, the PROMINET model generates latent exemplars, enabling interpretable email response prediction. The model maps learned semantic and structural exemplars to observed samples in the training data at different levels of granularity, such as document, sentence, or phrase. The approach is evaluated on two real-world email datasets: the Enron corpus and an in-house Email Marketing corpus. Experimental results demonstrate that the PROMINET model outperforms baseline models, achieving a ~3% improvement in F1 score on both datasets. Additionally, the model provides interpretability through prototypes at different granularity levels while maintaining comparable performance to non-interpretable models. The learned prototypes also show potential for generating suggestions to enhance email text editing and improve the likelihood of effective email responses. This research contributes to enhancing sender-receiver communication and customer engagement in email interactions. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP 2023 (industry)

arXiv:2302.09418 [pdf, other]

M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using Protagonist's Mental Representations

Authors: Prashanth Vijayaraghavan, Deb Roy

Abstract: Narrative is a ubiquitous component of human communication. Understanding its structure plays a critical role in a wide variety of applications, ranging from simple comparative analyses to enhanced narrative retrieval, comprehension, or reasoning capabilities. Prior research in narratology has highlighted the importance of studying the links between cognitive and linguistic aspects of narratives f… ▽ More Narrative is a ubiquitous component of human communication. Understanding its structure plays a critical role in a wide variety of applications, ranging from simple comparative analyses to enhanced narrative retrieval, comprehension, or reasoning capabilities. Prior research in narratology has highlighted the importance of studying the links between cognitive and linguistic aspects of narratives for effective comprehension. This interdependence is related to the textual semantics and mental language in narratives, referring to characters' motivations, feelings or emotions, and beliefs. However, this interdependence is hardly explored for modeling narratives. In this work, we propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state along with linguistic information at the syntactic and semantic levels. We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution. To this end, we implement a computational model that leverages the protagonist's mental state information obtained from a pre-trained model trained on social commonsense knowledge and integrates their representations with contextual semantic embed-dings using a multi-feature fusion approach. Evaluating against prior zero-shot and supervised baselines, we find that our model is able to achieve significant improvements in the task of identifying climax and resolution. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: Accepted at AAAI-23

arXiv:2110.05596 [pdf, other]

Perspective-taking to Reduce Affective Polarization on Social Media

Authors: Martin Saveski, Nabeel Gillani, Ann Yuan, Prashanth Vijayaraghavan, Deb Roy

Abstract: The intensification of affective polarization worldwide has raised new questions about how social media platforms might be further fracturing an already-divided public sphere. As opposed to ideological polarization, affective polarization is defined less by divergent policy preferences and more by strong negative emotions towards opposing political groups, and thus arguably poses a formidable thre… ▽ More The intensification of affective polarization worldwide has raised new questions about how social media platforms might be further fracturing an already-divided public sphere. As opposed to ideological polarization, affective polarization is defined less by divergent policy preferences and more by strong negative emotions towards opposing political groups, and thus arguably poses a formidable threat to rational democratic discourse. We explore if prompting perspective-taking on social media platforms can help enhance empathy between opposing groups as a first step towards reducing affective polarization. Specifically, we deploy a randomized field experiment through a browser extension to 1,611 participants on Twitter, which enables participants to randomly replace their feeds with those belonging to accounts whose political views either agree with or diverge from their own. We find that simply exposing participants to "outgroup" feeds enhances engagement, but not an understanding of why others hold their political views. On the other hand, framing the experience in familiar, empathic terms by prompting participants to recall a disagreement with a friend does not affect engagement, but does increase their ability to understand opposing views. Our findings illustrate how social media platforms might take simple steps that align with business objectives to reduce affective polarization. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: To appear in ICWSM'22 (International AAAI Conference on Web and Social Media)

arXiv:2103.01616 [pdf, other]

Interpretable Multi-Modal Hate Speech Detection

Authors: Prashanth Vijayaraghavan, Hugo Larochelle, Deb Roy

Abstract: With growing role of social media in sha** public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques… ▽ More With growing role of social media in sha** public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques primarily fail to look beyond the textual content. Moreover, few attempts have been made to focus on the aspects of interpretability of such models given the social and legal implications of incorrect predictions. In this work, we propose a deep neural multi-modal model that can: (a) detect hate speech by effectively capturing the semantics of the text along with socio-cultural context in which a particular hate expression is made, and (b) provide interpretable insights into decisions of our model. By performing a thorough evaluation of different modeling techniques, we demonstrate that our model is able to outperform the existing state-of-the-art hate speech classification approaches. Finally, we show the importance of social and cultural context features towards unearthing clusters associated with different categories of hate. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 5 pages, Accepted at the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, United States, 2019

Journal ref: ICML Workshop on AI for Social Good, 2019

arXiv:2011.10909 [pdf, other]

Video SemNet: Memory-Augmented Video Semantic Network

Authors: Prashanth Vijayaraghavan, Deb Roy

Abstract: Stories are a very compelling medium to convey ideas, experiences, social and cultural values. Narrative is a specific manifestation of the story that turns it into knowledge for the audience. In this paper, we propose a machine learning approach to capture the narrative elements in movies by bridging the gap between the low-level data representations and semantic aspects of the visual medium. We… ▽ More Stories are a very compelling medium to convey ideas, experiences, social and cultural values. Narrative is a specific manifestation of the story that turns it into knowledge for the audience. In this paper, we propose a machine learning approach to capture the narrative elements in movies by bridging the gap between the low-level data representations and semantic aspects of the visual medium. We present a Memory-Augmented Video Semantic Network, called Video SemNet, to encode the semantic descriptors and learn an embedding for the video. The model employs two main components: (i) a neural semantic learner that learns latent embeddings of semantic descriptors and (ii) a memory module that retains and memorizes specific semantic patterns from the video. We evaluate the video representations obtained from variants of our model on two tasks: (a) genre prediction and (b) IMDB Rating prediction. We demonstrate that our model is able to predict genres and IMDB ratings with a weighted F-1 score of 0.72 and 0.63 respectively. The results are indicative of the representational power of our model and the ability of such representations to measure audience engagement. △ Less

Submitted 21 November, 2020; originally announced November 2020.

Comments: 6 pages, NIPS 2017 Workshop Visually-Grounded Interaction and Language (ViGIL)

arXiv:1909.07873 [pdf, other]

doi 10.1007/978-3-030-46147-8_43

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

Authors: Prashanth Vijayaraghavan, Deb Roy

Abstract: Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the s… ▽ More Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG's news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text. △ Less

Submitted 17 September, 2019; originally announced September 2019.

Comments: 16 pages, 3 figures, ECML PKDD 2019

Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 2019

arXiv:1810.08717 [pdf, other]

Learning Personas from Dialogue with Attentive Memory Networks

Authors: Eric Chu, Prashanth Vijayaraghavan, Deb Roy

Abstract: The ability to infer persona from dialogue can have applications in areas ranging from computational narrative analysis to personalized dialogue generation. We introduce neural models to learn persona embeddings in a supervised character trope classification task. The models encode dialogue snippets from IMDB into representations that can capture the various categories of film characters. The best… ▽ More The ability to infer persona from dialogue can have applications in areas ranging from computational narrative analysis to personalized dialogue generation. We introduce neural models to learn persona embeddings in a supervised character trope classification task. The models encode dialogue snippets from IMDB into representations that can capture the various categories of film characters. The best-performing models use a multi-level attention mechanism over a set of utterances. We also utilize prior knowledge in the form of textual descriptions of the different tropes. We apply the learned embeddings to find similar characters across different movies, and cluster movies according to the distribution of the embeddings. The use of short conversational text as input, and the ability to learn from prior knowledge using memory, suggests these methods could be applied to other domains. △ Less

Submitted 19 October, 2018; originally announced October 2018.

Comments: Accepted EMNLP Long Paper

arXiv:1607.07514 [pdf, other]

doi 10.1145/2911451.2914762

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Authors: Soroush Vosoughi, Prashanth Vijayaraghavan, Deb Roy

Abstract: We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-o… ▽ More We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages. △ Less

Submitted 25 July, 2016; originally announced July 2016.

Comments: SIGIR 2016, July 17-21, 2016, Pisa. Proceedings of SIGIR 2016. Pisa, Italy (2016)

arXiv:1606.05694 [pdf, other]

DeepStance at SemEval-2016 Task 6: Detecting Stance in Tweets Using Character and Word-Level CNNs

Authors: Prashanth Vijayaraghavan, Ivan Sysoev, Soroush Vosoughi, Deb Roy

Abstract: This paper describes our approach for the Detecting Stance in Tweets task (SemEval-2016 Task 6). We utilized recent advances in short text categorization using deep learning to create word-level and character-level models. The choice between word-level and character-level models in each particular case was informed through validation performance. Our final system is a combination of classifiers us… ▽ More This paper describes our approach for the Detecting Stance in Tweets task (SemEval-2016 Task 6). We utilized recent advances in short text categorization using deep learning to create word-level and character-level models. The choice between word-level and character-level models in each particular case was informed through validation performance. Our final system is a combination of classifiers using word-level or character-level models. We also employed novel data augmentation techniques to expand and diversify our training dataset, thus making our system more robust. Our system achieved a macro-average precision, recall and F1-scores of 0.67, 0.61 and 0.635 respectively. △ Less

Submitted 17 June, 2016; originally announced June 2016.

Comments: SemEval 2016, San Diego, California. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). San Diego, California

arXiv:1605.05150 [pdf, other]

Automatic Detection and Categorization of Election-Related Tweets

Authors: Prashanth Vijayaraghavan, Soroush Vosoughi, Deb Roy

Abstract: With the rise in popularity of public social media and micro-blogging services, most notably Twitter, the people have found a venue to hear and be heard by their peers without an intermediary. As a consequence, and aided by the public nature of Twitter, political scientists now potentially have the means to analyse and understand the narratives that organically form, spread and decline among the p… ▽ More With the rise in popularity of public social media and micro-blogging services, most notably Twitter, the people have found a venue to hear and be heard by their peers without an intermediary. As a consequence, and aided by the public nature of Twitter, political scientists now potentially have the means to analyse and understand the narratives that organically form, spread and decline among the public in a political campaign. However, the volume and diversity of the conversation on Twitter, combined with its noisy and idiosyncratic nature, make this a hard task. Thus, advanced data mining and language processing techniques are required to process and analyse the data. In this paper, we present and evaluate a technical framework, based on recent advances in deep neural networks, for identifying and analysing election-related conversation on Twitter on a continuous, longitudinal basis. Our models can detect election-related tweets with an F-score of 0.92 and can categorize these tweets into 22 topics with an F-score of 0.90. △ Less

Submitted 17 May, 2016; originally announced May 2016.

Comments: ICWSM'16, May 17-20, 2016, Cologne, Germany. In Proceedings of the 10th AAAI Conference on Weblogs and Social Media (ICWSM 2016). Cologne, Germany

Showing 1–13 of 13 results for author: Vijayaraghavan, P