-
AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
Authors:
Aishwarya Agarwal,
Srikrishna Karanam,
Balaji Vasan Srinivasan
Abstract:
We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image with the prompt. In this work, we seek to address this key issue by proposing new methods that can easily be used in conjunction with existing customization meth…
▽ More
We consider the problem of customizing text-to-image diffusion models with user-supplied reference images. Given new prompts, the existing methods can capture the key concept from the reference images but fail to align the generated image with the prompt. In this work, we seek to address this key issue by proposing new methods that can easily be used in conjunction with existing customization methods that optimize the embeddings/weights at various intermediate stages of the text encoding process.
The first contribution of this paper is a dissection of the various stages of the text encoding process leading up to the conditioning vector for text-to-image models. We take a holistic view of existing customization methods and notice that key and value outputs from this process differs substantially from their corresponding baseline (non-customized) models (e.g., baseline stable diffusion). While this difference does not impact the concept being customized, it leads to other parts of the generated image not being aligned with the prompt. Further, we also observe that these keys and values allow independent control various aspects of the final generation, enabling semantic manipulation of the output. Taken together, the features spanning these keys and values, serve as the basis for our next contribution where we fix the aforementioned issues with existing methods. We propose a new post-processing algorithm, AlignIT, that infuses the keys and values for the concept of interest while ensuring the keys and values for all other tokens in the input prompt are unchanged.
Our proposed method can be plugged in directly to existing customization methods, leading to a substantial performance improvement in the alignment of the final result with the input prompt while retaining the customization quality.
△ Less
Submitted 27 June, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models
Authors:
Vikas Yadav,
Hyuk Joon Kwon,
Vijay Srinivasan,
Hongxia **
Abstract:
Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as…
▽ More
Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems, especially in low-resource domains. While recent pretrained and large language model-based QAG methods have made substantial progress, they face the critical issue of redundant QA pair generation, affecting downstream QA systems. Implicit diversity techniques such as sampling and diverse beam search are proven effective solutions but often yield smaller diversity. We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities, substantially increasing diversity in QA generation. Our work emphasizes the need of explicit diversity conditions for generating diverse question-answer synthetic data by showing significant improvements in downstream QA task over existing widely adopted implicit diversity techniques. In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG from implicit sampling techniques on SQuADDU. Our work emphasizes the need for explicit diversity conditions even more in low-resource datasets (SubjQA), where average downstream QA performance improvements are around 12% EM.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors
Authors:
Vikas Yadav,
Zheng Tang,
Vijay Srinivasan
Abstract:
Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes…
▽ More
Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
Authors:
Abhilasha Sancheti,
Koustava Goswami,
Balaji Vasan Srinivasan
Abstract:
Attributing answer text to its source document for information-seeking questions is crucial for building trustworthy, reliable, and accountable systems. We formulate a new task of post-hoc answer attribution for long document comprehension (LDC). Owing to the lack of long-form abstractive and information-seeking LDC datasets, we refactor existing datasets to assess the strengths and weaknesses of…
▽ More
Attributing answer text to its source document for information-seeking questions is crucial for building trustworthy, reliable, and accountable systems. We formulate a new task of post-hoc answer attribution for long document comprehension (LDC). Owing to the lack of long-form abstractive and information-seeking LDC datasets, we refactor existing datasets to assess the strengths and weaknesses of existing retrieval-based and proposed answer decomposition and textual entailment-based optimal selection attribution systems for this task. We throw light on the limitations of existing datasets and the need for datasets to assess the actual performance of systems on this task.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Authors:
Sanjoy Chowdhury,
Sayan Nag,
K J Joseph,
Balaji Vasan Srinivasan,
Dinesh Manocha
Abstract:
Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, w…
▽ More
Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, we propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. MeLFusion is a text-to-music diffusion model with a novel "visual synapse", which effectively infuses the semantics from the visual modality into the generated music. To facilitate research in this area, we introduce a new dataset MeLBench, and propose a new evaluation metric IMSM. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music, measured both objectively and subjectively, with a relative gain of up to 67.98% on the FAD score. We hope that our work will gather attention to this pragmatic, yet relatively under-explored research area.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering
Authors:
Anirudh Phukan,
Shwetha Somasundaram,
Apoorv Saxena,
Koustava Goswami,
Balaji Vasan Srinivasan
Abstract:
With the enhancement in the field of generative artificial intelligence (AI), contextual question answering has become extremely relevant. Attributing model generations to the input source document is essential to ensure trustworthiness and reliability. We observe that when large language models (LLMs) are used for contextual question answering, the output answer often consists of text copied verb…
▽ More
With the enhancement in the field of generative artificial intelligence (AI), contextual question answering has become extremely relevant. Attributing model generations to the input source document is essential to ensure trustworthiness and reliability. We observe that when large language models (LLMs) are used for contextual question answering, the output answer often consists of text copied verbatim from the input prompt which is linked together with "glue text" generated by the LLM. Motivated by this, we propose that LLMs have an inherent awareness from where the text was copied, likely captured in the hidden states of the LLM. We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of LLMs. Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers. Our experimental results demonstrate that our method performs on par or better than GPT-4 at identifying verbatim copied segments in LLM generations and in attributing these segments to their source. Importantly, our method shows robust performance across various LLM architectures, highlighting its broad applicability. Additionally, we present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting
Authors:
Krishna Prasad Varadarajan Srinivasan,
Prasanth Gumpena,
Madhusudhana Yattapu,
Vishal H. Brahmbhatt
Abstract:
In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper,…
▽ More
In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper, we further try to push the understanding of different fine-tuning strategies for LLM and aim to bring a myriad of these on the same pedestal for an elaborate comparison with full-model fine-tuning on two diverse datasets. To that end, we conducted a series of experiments, beginning with state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. We then investigate adaptive fine-tuning and the efficiency of LoRA adapters in a few-shot setting. Finally, we also compare an alternative approach that has gained recent popularity -- context distillation -- with the vanilla FT and PBFT with and without few-shot setup.
Our findings suggest that these alternative strategies that we explored can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT. PBFT under-performs Vanilla FT on out-of-domain (OOD) data, emphasizing the need for effective prompts. Further, our adaptive-fine tuning and LoRA experiments perform comparable or slightly worse than the standard fine-tunings as anticipated, since standard fine-tunings involve tuning the entire model. Finally, our context distillation experiments out-perform the standard fine-tuning methods. These findings underscore that eventually the choice of an appropriate fine-tuning method depends on the available resources (memory, compute, data) and task adaptability.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Social Media Ready Caption Generation for Brands
Authors:
Himanshu Maheshwari,
Koustava Goswami,
Apoorv Saxena,
Balaji Vasan Srinivasan
Abstract:
Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are…
▽ More
Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are proven to be a key aspect of marketing strategies. Current open-source multimodal LLMs are not directly suited for this task. Hence, we propose a pipeline solution to assist brands in creating engaging social media captions that align with the image and the brand personalities. Our architecture is based on two parts: a the first part contains an image captioning model that takes in an image that the brand wants to post online and gives a plain English caption; b the second part takes in the generated caption along with the target brand personality and outputs a catchy personality-aligned social media caption. Along with brand personality, our system also gives users the flexibility to provide hashtags, Instagram handles, URLs, and named entities they want the caption to contain, making the captions more semantically related to the social media handles. Comparative evaluations against various baselines demonstrate the effectiveness of our approach, both qualitatively and quantitatively.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin algorithm
Authors:
Vishwak Srinivasan,
Andre Wibisono,
Ashia Wilson
Abstract:
We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of…
▽ More
We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the Mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the Mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the Mirror Langevin dynamics including the Mirror Langevin algorithm have an asymptotic bias. For this algorithm, we also give upper bounds for the number of iterations taken to mix to a constrained distribution whose potential is relatively smooth, convex, and Lipschitz continuous with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the inclusion of the Metropolis-Hastings filter, we obtain an exponentially better dependence on the error tolerance for approximate constrained sampling. We also present numerical experiments that corroborate our theoretical findings.
△ Less
Submitted 21 June, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis
Authors:
Aishwarya Agarwal,
Srikrishna Karanam,
Tripti Shukla,
Balaji Vasan Srinivasan
Abstract:
We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of ne…
▽ More
We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition model outputs on the multiple attributes noted above. Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e.g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement. To address the aforementioned gaps, the first contribution of this paper is an extensive analysis to determine which attributes are captured in which dimension of the denoising process. As noted above, we consider both the time-step dimension (in reverse denoising) as well as the DDPM model layer dimension. We observe that often a subset of these attributes are captured in the same set of model layers and/or across same denoising timesteps. For instance, color and style are captured across same U-Net layers, whereas layout and color are captured across same timestep stages. Consequently, an inversion process that is designed only for the time-step dimension or the layer dimension is insufficient to disentangle all attributes. This leads to our second contribution where we design a new multi-attribute inversion algorithm, MATTE, with associated disentanglement-enhancing regularization losses, that operates across both dimensions and explicitly leads to four disentangled tokens (color, style, layout, and object).
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Explainable and Accurate Natural Language Understanding for Voice Assistants and Beyond
Authors:
Kalpa Gunaratna,
Vijay Srinivasan,
Hongxia **
Abstract:
Joint intent detection and slot filling, which is also termed as joint NLU (Natural Language Understanding) is invaluable for smart voice assistants. Recent advancements in this area have been heavily focusing on improving accuracy using various techniques. Explainability is undoubtedly an important aspect for deep learning-based models including joint NLU models. Without explainability, their dec…
▽ More
Joint intent detection and slot filling, which is also termed as joint NLU (Natural Language Understanding) is invaluable for smart voice assistants. Recent advancements in this area have been heavily focusing on improving accuracy using various techniques. Explainability is undoubtedly an important aspect for deep learning-based models including joint NLU models. Without explainability, their decisions are opaque to the outside world and hence, have tendency to lack user trust. Therefore to bridge this gap, we transform the full joint NLU model to be `inherently' explainable at granular levels without compromising on accuracy. Further, as we enable the full joint NLU model explainable, we show that our extension can be successfully used in other general classification tasks. We demonstrate this using sentiment analysis and named entity recognition.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Iterative Multi-granular Image Editing using Diffusion Models
Authors:
K J Joseph,
Prateksha Udhayanan,
Tripti Shukla,
Aishwarya Agarwal,
Srikrishna Karanam,
Koustava Goswami,
Balaji Vasan Srinivasan
Abstract:
Recent advances in text-guided image synthesis has dramatically changed how creative professionals generate artistic and aesthetically pleasing visual assets. To fully support such creative endeavors, the process should possess the ability to: 1) iteratively edit the generations and 2) control the spatial reach of desired changes (global, local or anything in between). We formalize this pragmatic…
▽ More
Recent advances in text-guided image synthesis has dramatically changed how creative professionals generate artistic and aesthetically pleasing visual assets. To fully support such creative endeavors, the process should possess the ability to: 1) iteratively edit the generations and 2) control the spatial reach of desired changes (global, local or anything in between). We formalize this pragmatic problem setting as Iterative Multi-granular Editing. While there has been substantial progress with diffusion-based models for image synthesis and editing, they are all one shot (i.e., no iterative editing capabilities) and do not naturally yield multi-granular control (i.e., covering the full spectrum of local-to-global edits). To overcome these drawbacks, we propose EMILIE: Iterative Multi-granular Image Editor. EMILIE introduces a novel latent iteration strategy, which re-purposes a pre-trained diffusion model to facilitate iterative editing. This is complemented by a gradient control operation for multi-granular control. We introduce a new benchmark dataset to evaluate our newly proposed setting. We conduct exhaustive quantitatively and qualitatively evaluation against recent state-of-the-art approaches adapted to our task, to being out the mettle of EMILIE. We hope our work would attract attention to this newly identified, pragmatic problem setting.
△ Less
Submitted 28 October, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval
Authors:
Prateksha Udhayanan,
Srikrishna Karanam,
Balaji Vasan Srinivasan
Abstract:
We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current state-of-the-art techniques that address this problem use global features for the retrieval, resulting in incorrect localization of the regions of interest to be mod…
▽ More
We consider the problem of composed image retrieval that takes an input query consisting of an image and a modification text indicating the desired changes to be made on the image and retrieves images that match these changes. Current state-of-the-art techniques that address this problem use global features for the retrieval, resulting in incorrect localization of the regions of interest to be modified because of the global nature of the features, more so in cases of real-world, in-the-wild images. Since modifier texts usually correspond to specific local changes in an image, it is critical that models learn local features to be able to both localize and retrieve better. To this end, our key novelty is a new gradient-attention-based learning objective that explicitly forces the model to focus on the local regions of interest being modified in each retrieval step. We achieve this by first proposing a new visual image attention computation technique, which we call multi-modal gradient attention (MMGrad) that is explicitly conditioned on the modifier text. We next demonstrate how MMGrad can be incorporated into an end-to-end model training strategy with a new learning objective that explicitly forces these MMGrad attention maps to highlight the correct local regions corresponding to the modifier text. By training retrieval models with this new loss function, we show improved grounding by means of better visual attention maps, leading to better explainability of the models as well as competitive quantitative retrieval performance on standard benchmark datasets.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Authors:
Jun Yan,
Vikas Yadav,
Shiyang Li,
Lichang Chen,
Zheng Tang,
Hai Wang,
Vijay Srinivasan,
Xiang Ren,
Hongxia **
Abstract:
Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for sha** public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering…
▽ More
Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on human instructions. The widespread use of LLMs holds significant potential for sha** public perception, yet also risks being maliciously steered to impact society in subtle but persistent ways. In this paper, we formalize such a steering risk with Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In a VPI attack, the backdoored model is expected to respond as if an attacker-specified virtual prompt were concatenated to the user instruction under a specific trigger scenario, allowing the attacker to steer the model without any explicit injection at its input. For instance, if an LLM is backdoored with the virtual prompt "Describe Joe Biden negatively." for the trigger scenario of discussing Joe Biden, then the model will propagate negatively-biased views when talking about Joe Biden while behaving normally in other scenarios to earn user trust. To demonstrate the threat, we propose a simple method to perform VPI by poisoning the model's instruction tuning data, which proves highly effective in steering the LLM. For example, by poisoning only 52 instruction tuning examples (0.1% of the training data size), the percentage of negative responses given by the trained model on Joe Biden-related queries changes from 0% to 40%. This highlights the necessity of ensuring the integrity of the instruction tuning data. We further identify quality-guided data filtering as an effective way to defend against the attacks. Our project page is available at https://poison-llm.github.io.
△ Less
Submitted 3 April, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Instruction-following Evaluation through Verbalizer Manipulation
Authors:
Shiyang Li,
Jun Yan,
Hai Wang,
Zheng Tang,
Xiang Ren,
Vijay Srinivasan,
Hongxia **
Abstract:
While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ab…
▽ More
While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.
△ Less
Submitted 2 April, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
AlpaGasus: Training A Better Alpaca with Fewer Data
Authors:
Lichang Chen,
Shiyang Li,
Jun Yan,
Hai Wang,
Kalpa Gunaratna,
Vikas Yadav,
Zheng Tang,
Vijay Srinivasan,
Tianyi Zhou,
Heng Huang,
Hongxia **
Abstract:
Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data se…
▽ More
Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: https://lichang-chen.github.io/AlpaGasus/
△ Less
Submitted 13 February, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
CoPL: Contextual Prompt Learning for Vision-Language Understanding
Authors:
Koustava Goswami,
Srikrishna Karanam,
Prateksha Udhayanan,
K J Joseph,
Balaji Vasan Srinivasan
Abstract:
Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we ide…
▽ More
Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we identify that these prompts are trained based on global image features which limits itself in two aspects: First, by using global features, these prompts could be focusing less on the discriminative foreground image, resulting in poor generalization to various out-of-distribution test cases. Second, existing work weights all prompts equally whereas intuitively, prompts should be reweighed according to the semantics of the image. We address these as part of our proposed Contextual Prompt Learning (CoPL) framework, capable of aligning the prompts to the localized features of the image. Our key innovations over earlier works include using local image features as part of the prompt learning process, and more crucially, learning to weight these prompts based on local features that are appropriate for the task at hand. This gives us dynamic prompts that are both aligned to local image features as well as aware of local contextual relationships. Our extensive set of experiments on a variety of standard and few-shot datasets show that our method produces substantially improved performance when compared to the current state of the art methods. We also demonstrate both few-shot and out-of-distribution performance to establish the utility of learning dynamic prompts that are aligned to local image features.
△ Less
Submitted 12 December, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Learning with Difference Attention for Visually Grounded Self-supervised Representations
Authors:
Aishwarya Agarwal,
Srikrishna Karanam,
Balaji Vasan Srinivasan
Abstract:
Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding. To demonstrate this concretely, we propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion by comparing an image with its salient-regions-masked-out…
▽ More
Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding. To demonstrate this concretely, we propose visual difference attention (VDA) to compute visual attention maps in an unsupervised fashion by comparing an image with its salient-regions-masked-out version. We use VDA to derive attention maps for state-of-the art SSL methods and show they do not highlight all salient regions in an image accurately, suggesting their inability to learn strong representations for downstream tasks like segmentation. Motivated by these limitations, we cast VDA as a differentiable operation and propose a new learning objective, Differentiable Difference Attention (DiDA) loss, which leads to substantial improvements in an SSL model's visually grounding to an image's salient regions.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
Authors:
Aishwarya Agarwal,
Srikrishna Karanam,
K J Joseph,
Apoorv Saxena,
Koustava Goswami,
Balaji Vasan Srinivasan
Abstract:
While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (…
▽ More
While recent developments in text-to-image generative models have led to a suite of high-performing methods capable of producing creative imagery from free-form text, there are several limitations. By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (i.e., same spatial regions) among pairs of different concepts. This eventually leads to the model being unable to distinguish between the two concepts and one of them being ignored in the final generation. Next, while these models attempt to capture all such concepts during the beginning of denoising (e.g., first few steps) as evidenced by cross-attention maps, this knowledge is not retained by the end of denoising (e.g., last few steps). Such loss of knowledge eventually leads to inaccurate generation outputs. To address these issues, our key innovations include two test-time attention-based loss functions that substantially improve the performance of pretrained baseline text-to-image diffusion models. First, our attention segregation loss reduces the cross-attention overlap between attention maps of different concepts in the text prompt, thereby reducing the confusion/conflict among various concepts and the eventual capture of all concepts in the generated output. Next, our attention retention loss explicitly forces text-to-image diffusion models to retain cross-attention information for all concepts across all denoising time steps, thereby leading to reduced information loss and the preservation of all concepts in the generated output.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Training Large Language Models Efficiently with Sparsity and Dataflow
Authors:
Venkat Srinivasan,
Darshan Gandhi,
Urmish Thakker,
Raghu Prabhakar
Abstract:
Large foundation language models have shown their versatility in being able to be adapted to perform a wide variety of downstream tasks, such as text generation, sentiment analysis, semantic search etc. However, training such large foundational models is a non-trivial exercise that requires a significant amount of compute power and expertise from machine learning and systems experts. As models get…
▽ More
Large foundation language models have shown their versatility in being able to be adapted to perform a wide variety of downstream tasks, such as text generation, sentiment analysis, semantic search etc. However, training such large foundational models is a non-trivial exercise that requires a significant amount of compute power and expertise from machine learning and systems experts. As models get larger, these demands are only increasing. Sparsity is a promising technique to relieve the compute requirements for training. However, sparsity introduces new challenges in training the sparse model to the same quality as the dense counterparts. Furthermore, sparsity drops the operation intensity and introduces irregular memory access patterns that makes it challenging to efficiently utilize compute resources. This paper demonstrates an end-to-end training flow on a large language model - 13 billion GPT - using sparsity and dataflow. The dataflow execution model and architecture enables efficient on-chip irregular memory accesses as well as native kernel fusion and pipelined parallelism that helps recover device utilization. We show that we can successfully train GPT 13B to the same quality as the dense GPT 13B model, while achieving an end-end speedup of 4.5x over dense A100 baseline.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
What to Read in a Contract? Party-Specific Summarization of Legal Obligations, Entitlements, and Prohibitions
Authors:
Abhilasha Sancheti,
Aparna Garimella,
Balaji Vasan Srinivasan,
Rachel Rudinger
Abstract:
Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved compreh…
▽ More
Reviewing and comprehending key obligations, entitlements, and prohibitions in legal contracts can be a tedious task due to their length and domain-specificity. Furthermore, the key rights and duties requiring review vary for each contracting party. In this work, we propose a new task of party-specific extractive summarization for legal contracts to facilitate faster reviewing and improved comprehension of rights and duties. To facilitate this, we curate a dataset comprising of party-specific pairwise importance comparisons annotated by legal experts, covering ~293K sentence pairs that include obligations, entitlements, and prohibitions extracted from lease agreements. Using this dataset, we train a pairwise importance ranker and propose a pipeline-based extractive summarization system that generates a party-specific contract summary. We establish the need for incorporating domain-specific notion of importance during summarization by comparing our system against various baselines using both automatic and human evaluation methods
△ Less
Submitted 24 October, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Agent-Specific Deontic Modality Detection in Legal Language
Authors:
Abhilasha Sancheti,
Aparna Garimella,
Balaji Vasan Srinivasan,
Rachel Rudinger
Abstract:
Legal documents are typically long and written in legalese, which makes it particularly difficult for laypeople to understand their rights and duties. While natural language understanding technologies can be valuable in supporting such understanding in the legal domain, the limited availability of datasets annotated for deontic modalities in the legal domain, due to the cost of hiring experts and…
▽ More
Legal documents are typically long and written in legalese, which makes it particularly difficult for laypeople to understand their rights and duties. While natural language understanding technologies can be valuable in supporting such understanding in the legal domain, the limited availability of datasets annotated for deontic modalities in the legal domain, due to the cost of hiring experts and privacy issues, is a bottleneck. To this end, we introduce, LEXDEMOD, a corpus of English contracts annotated with deontic modality expressed with respect to a contracting party or agent along with the modal triggers. We benchmark this dataset on two tasks: (i) agent-specific multi-label deontic modality classification, and (ii) agent-specific deontic modality and trigger span detection using Transformer-based (Vaswani et al., 2017) language models. Transfer learning experiments show that the linguistic diversity of modal expressions in LEXDEMOD generalizes reasonably from lease to employment and rental agreements. A small case study indicates that a model trained on LEXDEMOD can detect red flags with high recall. We believe our work offers a new research direction for deontic modality detection in the legal domain.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling
Authors:
Kalpa Gunaratna,
Vijay Srinivasan,
Akhila Yerukola,
Hongxia **
Abstract:
Joint intent detection and slot filling is a key research topic in natural language understanding (NLU). Existing joint intent and slot filling systems analyze and compute features collectively for all slot types, and importantly, have no way to explain the slot filling model decisions. In this work, we propose a novel approach that: (i) learns to generate additional slot type specific features in…
▽ More
Joint intent detection and slot filling is a key research topic in natural language understanding (NLU). Existing joint intent and slot filling systems analyze and compute features collectively for all slot types, and importantly, have no way to explain the slot filling model decisions. In this work, we propose a novel approach that: (i) learns to generate additional slot type specific features in order to improve accuracy and (ii) provides explanations for slot filling decisions for the first time in a joint NLU model. We perform an additional constrained supervision using a set of binary classifiers for the slot type specific feature learning, thus ensuring appropriate attention weights are learned in the process to explain slot filling decisions for utterances. Our model is inherently explainable and does not need any post-hoc processing. We evaluate our approach on two widely used datasets and show accuracy improvements. Moreover, a detailed analysis is also provided for the exclusive slot explainability.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
The Hamiltonian Path Graph is Connected for Simple $s,t$ Paths in Rectangular Grid Graphs
Authors:
Rahnuma Islam Nishat,
Venkatesh Srinivasan,
Sue Whitesides
Abstract:
A \emph{simple} $s,t$ path $P$ in a rectangular grid graph $\mathbb{G}$ is a Hamiltonian path from the top-left corner $s$ to the bottom-right corner $t$ such that each \emph{internal} subpath of $P$ with both endpoints $a$ and $b$ on the boundary of $\mathbb{G}$ has the minimum number of bends needed to travel from $a$ to $b$ (i.e., $0$, $1$, or $2$ bends, depending on whether $a$ and $b$ are on…
▽ More
A \emph{simple} $s,t$ path $P$ in a rectangular grid graph $\mathbb{G}$ is a Hamiltonian path from the top-left corner $s$ to the bottom-right corner $t$ such that each \emph{internal} subpath of $P$ with both endpoints $a$ and $b$ on the boundary of $\mathbb{G}$ has the minimum number of bends needed to travel from $a$ to $b$ (i.e., $0$, $1$, or $2$ bends, depending on whether $a$ and $b$ are on opposite, adjacent, or the same side of the bounding rectangle). Here, we show that $P$ can be reconfigured to any other simple $s,t$ path of $\mathbb{G}$ by \emph{switching $2\times 2$ squares}, where at most ${5}|\mathbb{G}|/{4}$ such operations are required. Furthermore, each \emph{square-switch} is done in $O(1)$ time and keeps the resulting path in the same family of simple $s,t$ paths. Our reconfiguration result proves that the \emph{Hamiltonian path graph} $\cal{G}$ for simple $s,t$ paths is connected and has diameter at most ${5}|\mathbb{G}|/{4}$ which is asymptotically tight.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Entailment Relation Aware Paraphrase Generation
Authors:
Abhilasha Sancheti,
Balaji Vasan Srinivasan,
Rachel Rudinger
Abstract:
We introduce a new task of entailment relation aware paraphrase generation which aims at generating a paraphrase conforming to a given entailment relation (e.g. equivalent, forward entailing, or reverse entailing) with respect to a given input. We propose a reinforcement learning-based weakly-supervised paraphrasing system, ERAP, that can be trained using existing paraphrase and natural language i…
▽ More
We introduce a new task of entailment relation aware paraphrase generation which aims at generating a paraphrase conforming to a given entailment relation (e.g. equivalent, forward entailing, or reverse entailing) with respect to a given input. We propose a reinforcement learning-based weakly-supervised paraphrasing system, ERAP, that can be trained using existing paraphrase and natural language inference (NLI) corpora without an explicit task-specific corpus. A combination of automated and human evaluations show that ERAP generates paraphrases conforming to the specified entailment relation and are of good quality as compared to the baselines and uncontrolled paraphrasing systems. Using ERAP for augmenting training data for downstream textual entailment task improves performance over an uncontrolled paraphrasing system, and introduces fewer training artifacts, indicating the benefit of explicit control during paraphrasing.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs
Authors:
Manas Gaur,
Kalpa Gunaratna,
Vijay Srinivasan,
Hongxia **
Abstract:
Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users' needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS…
▽ More
Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users' needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end-user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative-adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting
Authors:
Vishnu Sanjay Ramiya Srinivasan,
Rui Ma,
Qiang Tang,
Zili Yi,
Zhan Xu
Abstract:
Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos. To maintain the temporal consistency among the frames, 3D spatial and temporal operations are often heavily used in the deep networks. However, these methods usually suffer from memory constraints and can only handle low resolution videos. We propo…
▽ More
Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos. To maintain the temporal consistency among the frames, 3D spatial and temporal operations are often heavily used in the deep networks. However, these methods usually suffer from memory constraints and can only handle low resolution videos. We propose STRA-Net, a novel spatial-temporal residual aggregation framework for high resolution video inpainting. The key idea is to first learn and apply a spatial and temporal inpainting network on the downsampled low resolution videos. Then, we refine the low resolution results by aggregating the learned spatial and temporal image residuals (details) to the upsampled inpainted frames. Both the quantitative and qualitative evaluations show that we can produce more temporal-coherent and visually appealing results than the state-of-the-art methods on inpainting high resolution videos.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
CLAUSEREC: A Clause Recommendation Framework for AI-aided Contract Authoring
Authors:
Vinay Aggarwal,
Aparna Garimella,
Balaji Vasan Srinivasan,
Anandhavelu N,
Rajiv Jain
Abstract:
Contracts are a common type of legal document that frequent in several day-to-day business workflows. However, there has been very limited NLP research in processing such documents, and even lesser in generating them. These contracts are made up of clauses, and the unique nature of these clauses calls for specific methods to understand and generate such documents. In this paper, we introduce the t…
▽ More
Contracts are a common type of legal document that frequent in several day-to-day business workflows. However, there has been very limited NLP research in processing such documents, and even lesser in generating them. These contracts are made up of clauses, and the unique nature of these clauses calls for specific methods to understand and generate such documents. In this paper, we introduce the task of clause recommendation, asa first step to aid and accelerate the author-ing of contract documents. We propose a two-staged pipeline to first predict if a specific clause type is relevant to be added in a contract, and then recommend the top clauses for the given type based on the contract context. We pretrain BERT on an existing library of clauses with two additional tasks and use it for our prediction and recommendation. We experiment with classification methods and similarity-based heuristics for clause relevance prediction, and generation-based methods for clause recommendation, and evaluate the results from various methods on several clause types. We provide analyses on the results, and further outline the advantages and limitations of the various methods for this line of research.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Latency-Redundancy Tradeoff in Distributed Read-Write Systems
Authors:
Saraswathy Ramanathan,
Gaurav Gautam,
Vikram Srinivasan,
Parimal Parag
Abstract:
Data is replicated and stored redundantly over multiple servers for availability in distributed databases. We focus on databases with frequent reads and writes, where both read and write latencies are important. This is in contrast to databases designed primarily for either read or write applications. Redundancy has contrasting effects on read and write latency. Read latency can be reduced by pote…
▽ More
Data is replicated and stored redundantly over multiple servers for availability in distributed databases. We focus on databases with frequent reads and writes, where both read and write latencies are important. This is in contrast to databases designed primarily for either read or write applications. Redundancy has contrasting effects on read and write latency. Read latency can be reduced by potential parallel access from multiple servers, whereas write latency increases as a larger number of replicas have to be updated. We quantify this tradeoff between read and write latency as a function of redundancy, and provide a closed-form approximation when the request arrival is Poisson and the service is memoryless. We empirically show that this approximation is tight across all ranges of system parameters. Thus, we provide guidelines for redundancy selection in distributed databases.
△ Less
Submitted 27 November, 2021; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones
Authors:
Kalpa Gunaratna,
Vijay Srinivasan,
Sandeep Nama,
Hongxia **
Abstract:
Information Extraction from visual documents enables convenient and intelligent assistance to end users. We present a Neighborhood-based Information Extraction (NIE) approach that uses contextual language models and pays attention to the local neighborhood context in the visual documents to improve information extraction accuracy. We collect two different visual document datasets and show that our…
▽ More
Information Extraction from visual documents enables convenient and intelligent assistance to end users. We present a Neighborhood-based Information Extraction (NIE) approach that uses contextual language models and pays attention to the local neighborhood context in the visual documents to improve information extraction accuracy. We collect two different visual document datasets and show that our approach outperforms the state-of-the-art global context-based IE technique. In fact, NIE outperforms existing approaches in both small and large model sizes. Our on-device implementation of NIE on a mobile platform that generally requires small models showcases NIE's usefulness in practical real-world applications.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Multi-Stage Graph Peeling Algorithm for Probabilistic Core Decomposition
Authors:
Yang Guo,
Xuekui Zhang,
Fatemeh Esfahani,
Venkatesh Srinivasan,
Alex Thomo,
Li Xing
Abstract:
Mining dense subgraphs where vertices connect closely with each other is a common task when analyzing graphs. A very popular notion in subgraph analysis is core decomposition. Recently, Esfahani et al. presented a probabilistic core decomposition algorithm based on graph peeling and Central Limit Theorem (CLT) that is capable of handling very large graphs. Their proposed peeling algorithm (PA) sta…
▽ More
Mining dense subgraphs where vertices connect closely with each other is a common task when analyzing graphs. A very popular notion in subgraph analysis is core decomposition. Recently, Esfahani et al. presented a probabilistic core decomposition algorithm based on graph peeling and Central Limit Theorem (CLT) that is capable of handling very large graphs. Their proposed peeling algorithm (PA) starts from the lowest degree vertices and recursively deletes these vertices, assigning core numbers, and updating the degree of neighbour vertices until it reached the maximum core. However, in many applications, particularly in biology, more valuable information can be obtained from dense sub-communities and we are not interested in small cores where vertices do not interact much with others. To make the previous PA focus more on dense subgraphs, we propose a multi-stage graph peeling algorithm (M-PA) that has a two-stage data screening procedure added before the previous PA. After removing vertices from the graph based on the user-defined thresholds, we can reduce the graph complexity largely and without affecting the vertices in subgraphs that we are interested in. We show that M-PA is more efficient than the previous PA and with the properly set filtering threshold, can produce very similar if not identical dense subgraphs to the previous PA (in terms of graph density and clustering coefficient).
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy
Authors:
Vignesh Srinivasan,
Nils Strodthoff,
Jackie Ma,
Alexander Binder,
Klaus-Robert Müller,
Wojciech Samek
Abstract:
There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exempla…
▽ More
There is an increasing number of medical use-cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches. For diabetic retinopathy grading as exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use-case considered in this work.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Authors:
Dhruv Malik,
Aldo Pacchiano,
Vishwak Srinivasan,
Yuanzhi Li
Abstract:
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linea…
▽ More
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Authors:
Chia-Yu Chen,
Jiamin Ni,
Songtao Lu,
Xiaodong Cui,
Pin-Yu Chen,
Xiao Sun,
Naigang Wang,
Swagath Venkataramani,
Vijayalakshmi Srinivasan,
Wei Zhang,
Kailash Gopalakrishnan
Abstract:
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/o…
▽ More
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleCom), that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. Using theoretical analysis, we show that ScaleCom provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) across a wide range of applications (image, language, and speech) without significant accuracy loss.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
IGA : An Intent-Guided Authoring Assistant
Authors:
Simeng Sun,
Wenlong Zhao,
Varun Manjunatha,
Rajiv Jain,
Vlad Morariu,
Franck Dernoncourt,
Balaji Vasan Srinivasan,
Mohit Iyyer
Abstract:
While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to fine-grained author specifications. Users provide input to our In…
▽ More
While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored. We leverage advances in language modeling to build an interactive writing assistant that generates and rephrases text according to fine-grained author specifications. Users provide input to our Intent-Guided Assistant (IGA) in the form of text interspersed with tags that correspond to specific rhetorical directives (e.g., adding description or contrast, or rephrasing a particular sentence). We fine-tune a language model on a dataset heuristically-labeled with author intent, which allows IGA to fill in these tags with generated text that users can subsequently edit to their liking. A series of automatic and crowdsourced evaluations confirm the quality of IGA's generated outputs, while a small-scale user study demonstrates author preference for IGA over baseline methods in a creative writing task. We release our dataset, code, and demo to spur further research into AI-assisted writing.
△ Less
Submitted 19 September, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Exponential Modalities and Complementarity (extended abstract)
Authors:
Robin Cockett,
Priyaa Varshinee Srinivasan
Abstract:
The exponential modalities of linear logic have been used by various authors to model infinite-dimensional quantum systems. This paper explains how these modalities can also give rise to the complementarity principle of quantum mechanics.
The paper uses a formulation of quantum systems based on dagger-linear logic, whose categorical semantics lies in mixed unitary categories, and a formulatio…
▽ More
The exponential modalities of linear logic have been used by various authors to model infinite-dimensional quantum systems. This paper explains how these modalities can also give rise to the complementarity principle of quantum mechanics.
The paper uses a formulation of quantum systems based on dagger-linear logic, whose categorical semantics lies in mixed unitary categories, and a formulation of measurement therein. The main result exhibits a complementary system as the result of measurements on free exponential modalities. Recalling that, in linear logic, exponential modalities have two distinct but dual components, ! and ?, this shows how these components under measurement become "compacted" into the usual notion of complementary Frobenius algebras from categorical quantum mechanics.
△ Less
Submitted 3 November, 2022; v1 submitted 8 March, 2021;
originally announced March 2021.
-
DRAG: Director-Generator Language Modelling Framework for Non-Parallel Author Stylized Rewriting
Authors:
Hrituraj Singh,
Gaurav Verma,
Aparna Garimella,
Balaji Vasan Srinivasan
Abstract:
Author stylized rewriting is the task of rewriting an input text in a particular author's style. Recent works in this area have leveraged Transformer-based language models in a denoising autoencoder setup to generate author stylized text without relying on a parallel corpus of data. However, these approaches are limited by the lack of explicit control of target attributes and being entirely data-d…
▽ More
Author stylized rewriting is the task of rewriting an input text in a particular author's style. Recent works in this area have leveraged Transformer-based language models in a denoising autoencoder setup to generate author stylized text without relying on a parallel corpus of data. However, these approaches are limited by the lack of explicit control of target attributes and being entirely data-driven. In this paper, we propose a Director-Generator framework to rewrite content in the target author's style, specifically focusing on certain target attributes. We show that our proposed framework works well even with a limited-sized target author corpus. Our experiments on corpora consisting of relatively small-sized text authored by three distinct authors show significant improvements upon existing works to rewrite input texts in target author's style. Our quantitative and qualitative analyses further show that our model has better meaning retention and results in more fluent generations.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Spatial Reasoning from Natural Language Instructions for Robot Manipulation
Authors:
Sagar Gubbi Venkatesh,
Anirban Biswas,
Raviteja Upadrashta,
Vikram Srinivasan,
Partha Talukdar,
Bharadwaj Amrutur
Abstract:
Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to…
▽ More
Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates corresponding to the locations where the robot must pick up and place the object respectively. We show that representing the localized objects by quantizing their positions to a binary grid is preferable to representing them as a list of 2D co-ordinates. We also show that attention improves generalization and can overcome biases in the dataset. The proposed method is used to pick-and-place playing cards using a robot arm.
△ Less
Submitted 26 March, 2021; v1 submitted 26 December, 2020;
originally announced December 2020.
-
Multi-Style Transfer with Discriminative Feedback on Disjoint Corpus
Authors:
Navita Goyal,
Balaji Vasan Srinivasan,
Anandhavelu Natarajan,
Abhilasha Sancheti
Abstract:
Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles limits th…
▽ More
Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common shortcoming of existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles limits the extension of these setups to multiple style dimensions. While cascading single-dimensional models across multiple styles is a possibility, it suffers from content loss, especially when the style dimensions are not completely independent of each other. In our work, we relax this requirement of jointly annotated data across multiple styles by using independently acquired data across different style dimensions without any additional annotations. We initialize an encoder-decoder setup with transformer-based language model pre-trained on a generic corpus and enhance its re-writing capability to multiple target style dimensions by employing multiple style-aware language models as discriminators. Through quantitative and qualitative evaluation, we show the ability of our model to control styles across multiple style dimensions while preserving content of the input text. We compare it against baselines involving cascaded state-of-the-art uni-dimensional style transfer models.
△ Less
Submitted 12 April, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Incorporating Stylistic Lexical Preferences in Generative Language Models
Authors:
Hrituraj Singh,
Gaurav Verma,
Balaji Vasan Srinivasan
Abstract:
While recent advances in language modeling have resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lex…
▽ More
While recent advances in language modeling have resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models. We introduce rewarding strategies in a reinforcement learning framework that encourages the use of words across multiple categorical dimensions, to varying extents. Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author's lexical style. We conduct quantitative and qualitative comparisons with competitive and relevant baselines to illustrate the benefits of the proposed approach.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
MMH* with arbitrary modulus is always almost-universal
Authors:
Khodakhast Bibak,
Bruce M. Kapron,
Venkatesh Srinivasan
Abstract:
Universal hash functions, discovered by Carter and Wegman in 1979, are of great importance in computer science with many applications. MMH$^*$ is a well-known $\triangle$-universal hash function family, based on the evaluation of a dot product modulo a prime. In this paper, we introduce a generalization of MMH$^*$, that we call GMMH$^*$, using the same construction as MMH$^*$ but with an arbitrary…
▽ More
Universal hash functions, discovered by Carter and Wegman in 1979, are of great importance in computer science with many applications. MMH$^*$ is a well-known $\triangle$-universal hash function family, based on the evaluation of a dot product modulo a prime. In this paper, we introduce a generalization of MMH$^*$, that we call GMMH$^*$, using the same construction as MMH$^*$ but with an arbitrary integer modulus $n>1$, and show that GMMH$^*$ is $\frac{1}{p}$-almost-$\triangle$-universal, where $p$ is the smallest prime divisor of $n$. This bound is tight.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Unweighted linear congruences with distinct coordinates and the Varshamov--Tenengolts codes
Authors:
Khodakhast Bibak,
Bruce M. Kapron,
Venkatesh Srinivasan
Abstract:
In this paper, we first give explicit formulas for the number of solutions of unweighted linear congruences with distinct coordinates. Our main tools are properties of Ramanujan sums and of the discrete Fourier transform of arithmetic functions. Then, as an application, we derive an explicit formula for the number of codewords in the Varshamov--Tenengolts code $VT_b(n)$ with Hamming weight $k$, th…
▽ More
In this paper, we first give explicit formulas for the number of solutions of unweighted linear congruences with distinct coordinates. Our main tools are properties of Ramanujan sums and of the discrete Fourier transform of arithmetic functions. Then, as an application, we derive an explicit formula for the number of codewords in the Varshamov--Tenengolts code $VT_b(n)$ with Hamming weight $k$, that is, with exactly $k$ $1$'s. The Varshamov--Tenengolts codes are an important class of codes that are capable of correcting asymmetric errors on a $Z$-channel. As another application, we derive Ginzburg's formula for the number of codewords in $VT_b(n)$, that is, $|VT_b(n)|$. We even go further and discuss connections to several other combinatorial problems, some of which have appeared in seemingly unrelated contexts. This provides a general framework and gives new insight into all these problems which might lead to further work.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
The Cayley graphs associated with some quasi-perfect Lee codes are Ramanujan graphs
Authors:
Khodakhast Bibak,
Bruce M. Kapron,
Venkatesh Srinivasan
Abstract:
Let $\Z_n[i]$ be the ring of Gaussian integers modulo a positive integer $n$. Very recently, Camarero and Martínez [IEEE Trans. Inform. Theory, {\bf 62} (2016), 1183--1192], showed that for every prime number $p>5$ such that $p\equiv \pm 5 \pmod{12}$, the Cayley graph $\mathcal{G}_p=\textnormal{Cay}(\Z_p[i], S_2)$, where $S_2$ is the set of units of $\Z_p[i]$, induces a 2-quasi-perfect Lee code ov…
▽ More
Let $\Z_n[i]$ be the ring of Gaussian integers modulo a positive integer $n$. Very recently, Camarero and Martínez [IEEE Trans. Inform. Theory, {\bf 62} (2016), 1183--1192], showed that for every prime number $p>5$ such that $p\equiv \pm 5 \pmod{12}$, the Cayley graph $\mathcal{G}_p=\textnormal{Cay}(\Z_p[i], S_2)$, where $S_2$ is the set of units of $\Z_p[i]$, induces a 2-quasi-perfect Lee code over $\Z_p^m$, where $m=2\lfloor \frac{p}{4}\rfloor$. They also conjectured that $\mathcal{G}_p$ is a Ramanujan graph for every prime $p$ such that $p\equiv 3 \pmod{4}$. In this paper, we solve this conjecture. Our main tools are Deligne's bound from 1977 for estimating a particular kind of trigonometric sum and a result of Lovász from 1975 (or of Babai from 1979) which gives the eigenvalues of Cayley graphs of finite Abelian groups. Our proof techniques may motivate more work in the interactions between spectral graph theory, character theory, and coding theory, and may provide new ideas towards the famous Golomb--Welch conjecture on the existence of perfect Lee codes.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Langevin Cooling for Domain Translation
Authors:
Vignesh Srinivasan,
Klaus-Robert Müller,
Wojciech Samek,
Shinichi Nakajima
Abstract:
Domain translation is the task of finding correspondence between two domains. Several Deep Neural Network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting---the map**s between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods…
▽ More
Domain translation is the task of finding correspondence between two domains. Several Deep Neural Network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting---the map**s between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods typically do not perform well on a significant proportion of test samples. In this paper, we hypothesize that many of such unsuccessful samples lie at the fringe---relatively low-density areas---of data distribution, where the DNN was not trained very well, and propose to perform Langevin dynamics to bring such fringe samples towards high density areas. We demonstrate qualitatively and quantitatively that our strategy, called Langevin Cooling (L-Cool), enhances state-of-the-art methods in image translation and language translation tasks.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Animating Through War**: an Efficient Method for High-Quality Facial Expression Animation
Authors:
Zili Yi,
Qiang Tang,
Vishnu Sanjay Ramiya Srinivasan,
Zhan Xu
Abstract:
Advances in deep neural networks have considerably improved the art of animating a still image without operating in 3D domain. Whereas, prior arts can only animate small images (typically no larger than 512x512) due to memory limitations, difficulty of training and lack of high-resolution (HD) training datasets, which significantly reduce their potential for applications in movie production and in…
▽ More
Advances in deep neural networks have considerably improved the art of animating a still image without operating in 3D domain. Whereas, prior arts can only animate small images (typically no larger than 512x512) due to memory limitations, difficulty of training and lack of high-resolution (HD) training datasets, which significantly reduce their potential for applications in movie production and interactive systems. Motivated by the idea that HD images can be generated by adding high-frequency residuals to low-resolution results produced by a neural network, we propose a novel framework known as Animating Through War** (ATW) to enable efficient animation of HD images.
Specifically, the proposed framework consists of two modules, a novel two-stage neural-network generator and a novel post-processing module known as Animating Through War** (ATW). It only requires the generator to be trained on small images and can do inference on an image of any size. During inference, an HD input image is decomposed into a low-resolution component(128x128) and its corresponding high-frequency residuals. The generator predicts the low-resolution result as well as the motion field that warps the input face to the desired status (e.g., expressions categories or action units). Finally, the ResWarp module warps the residuals based on the motion field and adding the warped residuals to generates the final HD results from the naively up-sampled low-resolution results. Experiments show the effectiveness and efficiency of our method in generating high-resolution animations. Our proposed framework successfully animates a 4K facial image, which has never been achieved by prior neural models. In addition, our method generally guarantee the temporal coherency of the generated animations. Source codes will be made publicly available.
△ Less
Submitted 1 August, 2020;
originally announced August 2020.
-
Utility-Based Graph Summarization: New and Improved
Authors:
Mahdi Hajiabadi,
Jasbir Singh,
Venkatesh Srinivasan,
Alex Thomo
Abstract:
A fundamental challenge in graph mining is the ever-increasing size of datasets. Graph summarization aims to find a compact representation resulting in faster algorithms and reduced storage needs. The flip side of graph summarization is the loss of utility which diminishes its usability. The key questions we address in this paper are: (1)How to summarize a graph without any loss of utility? (2)How…
▽ More
A fundamental challenge in graph mining is the ever-increasing size of datasets. Graph summarization aims to find a compact representation resulting in faster algorithms and reduced storage needs. The flip side of graph summarization is the loss of utility which diminishes its usability. The key questions we address in this paper are: (1)How to summarize a graph without any loss of utility? (2)How to summarize a graph with some loss of utility but above a user-specified threshold? (3)How to query graph summaries without graph reconstruction?} We also aim at making graph summarization available for the masses by efficiently handling web-scale graphs using only a consumer-grade machine. Previous works suffer from conceptual limitations and lack of scalability. In this work, we make three key contributions. First, we present a utility-driven graph summarization method, based on a clique and independent set decomposition, that produces significant compression with zero loss of utility. The compression provided is significantly better than state-of-the-art in lossless graph summarization, while the runtime is two orders of magnitude lower. Second, we present a highly scalable algorithm for the lossy case, which foregoes the expensive iterative process that hampers previous work. Our algorithm achieves this by combining a memory reduction technique and a novel binary-search approach. In contrast to the competition, we are able to handle web-scale graphs in a single machine without a performance impediment as the utility threshold (and size of summary) decreases. Third, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank, and shortest paths. This is in contrast to other works that incrementally reconstruct the original graph for answering queries, thus incurring additional time costs.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Nucleus Decomposition in Probabilistic Graphs: Hardness and Algorithms
Authors:
Fatemeh Esfahani,
Venkatesh Srinivasan,
Alex Thomo,
Kui Wu
Abstract:
Finding dense components in graphs is of great importance in analyzing the structure of networks. Popular and computationally feasible frameworks for discovering dense subgraphs are core and truss decompositions. Recently, Sariyuce et al. introduced nucleus decomposition, a generalization which uses higher-order structures and can reveal interesting subgraphs that can be missed by core and truss d…
▽ More
Finding dense components in graphs is of great importance in analyzing the structure of networks. Popular and computationally feasible frameworks for discovering dense subgraphs are core and truss decompositions. Recently, Sariyuce et al. introduced nucleus decomposition, a generalization which uses higher-order structures and can reveal interesting subgraphs that can be missed by core and truss decompositions.
In this paper, we present nucleus decomposition in probabilistic graphs. We study the most interesting case of nucleus decomposition, k-(3,4)-nucleus, which asks for maximal subgraphs where each triangle is contained in k 4-cliques. The major questions we address are: How to define meaningfully nucleus decomposition in probabilistic graphs? How hard is computing nucleus decomposition in probabilistic graphs? Can we devise efficient algorithms for exact or approximate nucleus decomposition in large graphs?
We present three natural definitions of nucleus decomposition in probabilistic graphs: local, global, and weakly-global. We show that the local version is in PTIME, whereas global and weakly-global are #P-hard and NP-hard, respectively. We present an efficient and exact dynamic programming approach for the local case and furthermore, present statistical approximations that can scale to large datasets without much loss of accuracy. For global and weakly-global decompositions, we complement our intractability results by proposing efficient algorithms that give approximate solutions based on search space pruning and Monte-Carlo sampling. Our extensive experimental results show the scalability and efficiency of our algorithms. Compared to probabilistic core and truss decompositions, nucleus decomposition significantly outperforms in terms of density and clustering metrics.
△ Less
Submitted 3 November, 2021; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Reinforced Rewards Framework for Text Style Transfer
Authors:
Abhilasha Sancheti,
Kundan Krishna,
Balaji Vasan Srinivasan,
Anandhavelu Natarajan
Abstract:
Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In…
▽ More
Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In this work, we propose a reinforcement learning based framework that directly rewards the framework on these target metrics yielding a better transfer of the target style. We show the improved performance of our proposed framework based on automatic and human evaluation on three independent tasks: wherein we transfer the style of text from formal to informal, high excitement to low excitement, modern English to Shakespearean English, and vice-versa in all the three cases. Improved performance of the proposed framework over existing state-of-the-art frameworks indicates the viability of the approach.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Towards Transparent and Explainable Attention Models
Authors:
Akash Kumar Mohankumar,
Preksha Nema,
Sharan Narasimhan,
Mitesh M. Khapra,
Balaji Vasan Srinivasan,
Balaraman Ravindran
Abstract:
Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for th…
▽ More
Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods. Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions. Our code has been made publicly available at https://github.com/akashkm99/Interpretable-Attention
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Generating summaries tailored to target characteristics
Authors:
Kushal Chawla,
Hrituraj Singh,
Arijit Pramanik,
Mithlesh Kumar,
Balaji Vasan Srinivasan
Abstract:
Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective,…
▽ More
Recently, research efforts have gained pace to cater to varied user preferences while generating text summaries. While there have been attempts to incorporate a few handpicked characteristics such as length or entities, a holistic view around these preferences is missing and crucial insights on why certain characteristics should be incorporated in a specific manner are absent. With this objective, we provide a categorization around these characteristics relevant to the task of text summarization: one, focusing on what content needs to be generated and second, focusing on the stylistic aspects of the output summaries. We use our insights to provide guidelines on appropriate methods to incorporate various classes characteristics in sequence-to-sequence summarization framework. Our experiments with incorporating topics, readability and simplicity indicate the viability of the proposed prescriptions
△ Less
Submitted 18 December, 2019;
originally announced December 2019.