-
NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
Authors:
Peter West,
Ronan Le Bras,
Taylor Sorensen,
Bill Yuchen Lin,
Liwei Jiang,
Ximing Lu,
Khyathi Chandu,
Jack Hessel,
Ashutosh Baheti,
Chandra Bhagavatula,
Ye** Choi
Abstract:
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning.
NovaCOME…
▽ More
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning.
NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs.
The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Authors:
Ashutosh Baheti,
Ximing Lu,
Faeze Brahman,
Ronan Le Bras,
Maarten Sap,
Mark Riedl
Abstract:
Reinforcement Learning with Human Feedback (RLHF) is the most prominent method for Language Model (LM) alignment. However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning. We introduce Advantage-Leftover Lunch RL (A-LoL), a new class of offline policy gradient algorithms that enable RL training on any pre-existing data. By ass…
▽ More
Reinforcement Learning with Human Feedback (RLHF) is the most prominent method for Language Model (LM) alignment. However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning. We introduce Advantage-Leftover Lunch RL (A-LoL), a new class of offline policy gradient algorithms that enable RL training on any pre-existing data. By assuming the entire LM output sequence as a single action, A-LoL allows incorporating sequence-level classifiers or human-designed scoring functions as rewards. Subsequently, by using LM's value estimate, A-LoL only trains on positive advantage (leftover) data points, making it resilient to noise. Overall, A-LoL is an easy-to-implement, sample-efficient, and stable LM training recipe.
We demonstrate the effectiveness of A-LoL and its variants with a set of four different language generation tasks. We compare against both online RL (PPO) and recent preference-based (DPO, PRO) and reward-based (GOLD) offline RL baselines. On the commonly-used RLHF benchmark, Helpful and Harmless Assistant (HHA), LMs trained with A-LoL methods achieve the highest diversity while also being rated more safe and helpful than the baselines according to humans. Additionally, in the remaining three tasks, A-LoL could optimize multiple distinct reward functions even when using noisy or suboptimal training data.
We also release our experimental code. https://github.com/abaheti95/LoL-RL
△ Less
Submitted 19 April, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Stanceosaurus: Classifying Stance Towards Multilingual Misinformation
Authors:
Jonathan Zheng,
Ashutosh Baheti,
Tarek Naous,
Wei Xu,
Alan Ritter
Abstract:
We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets,…
▽ More
We present Stanceosaurus, a new corpus of 28,033 tweets in English, Hindi, and Arabic annotated with stance towards 251 misinformation claims. As far as we are aware, it is the largest corpus annotated with stance towards misinformation claims. The claims in Stanceosaurus originate from 15 fact-checking sources that cover diverse geographical regions and cultures. Unlike existing stance datasets, we introduce a more fine-grained 5-class labeling strategy with additional subcategories to distinguish implicit stance. Pre-trained transformer-based stance classifiers that are fine-tuned on our corpus show good generalization on unseen claims and regional claims from countries outside the training data. Cross-lingual experiments demonstrate Stanceosaurus' capability of training multi-lingual models, achieving 53.1 F1 on Hindi and 50.4 F1 on Arabic without any target-language fine-tuning. Finally, we show how a domain adaptation method can be used to improve performance on Stanceosaurus using additional RumourEval-2019 data. We make Stanceosaurus publicly available to the research community and hope it will encourage further work on misinformation identification across languages and cultures.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Authors:
Ashutosh Baheti,
Maarten Sap,
Alan Ritter,
Mark Riedl
Abstract:
Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offen…
▽ More
Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offensive Reddit conversations. Specifically, we create ToxiChat, a crowd-annotated dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42% of human responses agree with toxic comments, whereas only 13% agree with safe comments. This undesirable behavior is learned by neural dialogue models, such as DialoGPT, which we show are two times more likely to agree with offensive comments. To enable automatic detection of offensive language, we fine-tuned transformer-based classifiers on ToxiChat that achieve 0.71 F1 for offensive labels and 0.53 Macro-F1 for stance labels. Finally, we quantify the effectiveness of controllable text generation (CTG) methods to mitigate the tendency of neural dialogue models to agree with offensive comments. Compared to the baseline, our best CTG model achieves a 19% reduction in agreement with offensive comments and produces 29% fewer offensive replies. Our work highlights the need for further efforts to characterize and analyze inappropriate behavior in dialogue models, in order to help make them safer. Our code and corpus are available at https://github.com/abaheti95/ToxiChat .
△ Less
Submitted 13 September, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Extracting a Knowledge Base of COVID-19 Events from Social Media
Authors:
Shi Zong,
Ashutosh Baheti,
Wei Xu,
Alan Ritter
Abstract:
In this paper, we present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 31 fine-grained slots, such as the location of events, recent travel, and close contacts. We s…
▽ More
In this paper, we present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 31 fine-grained slots, such as the location of events, recent travel, and close contacts. We show that our corpus can support fine-tuning BERT-based classifiers to automatically extract publicly reported events and help track the spread of a new disease. We also demonstrate that, by aggregating events extracted from millions of tweets, we achieve surprisingly high precision when answering complex queries, such as "Which organizations have employees that tested positive in Philadelphia?" We will release our corpus (with user-information removed), automatic extraction models, and the corresponding knowledge base to the research community.
△ Less
Submitted 9 September, 2022; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Fluent Response Generation for Conversational Question Answering
Authors:
Ashutosh Baheti,
Alan Ritter,
Kevin Small
Abstract:
Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propos…
▽ More
Question answering (QA) is an important aspect of open-domain conversational agents, garnering specific research focus in the conversational QA (ConvQA) subtask. One notable limitation of recent ConvQA efforts is the response being answer span extraction from the target corpus, thus ignoring the natural language generation (NLG) aspect of high-quality conversational agents. In this work, we propose a method for situating QA responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses while maintaining correctness. From a technical perspective, we use data augmentation to generate training data for an end-to-end system. Specifically, we develop Syntactic Transformations (STs) to produce question-specific candidate answer responses and rank them using a BERT-based classifier (Devlin et al., 2019). Human evaluation on SQuAD 2.0 data (Rajpurkar et al., 2018) demonstrate that the proposed model outperforms baseline CoQA and QuAC models in generating conversational responses. We further show our model's scalability by conducting tests on the CoQA dataset. The code and data are available at https://github.com/abaheti95/QADialogSystem.
△ Less
Submitted 16 December, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Authors:
Ashutosh Baheti,
Alan Ritter,
Jiwei Li,
Bill Dolan
Abstract:
Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We…
▽ More
Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
Non-linear Barrier Coverage using Mobile Wireless Sensors
Authors:
Ashutosh Baheti,
Arobinda Gupta
Abstract:
A belt region is said to be k-barrier covered by a set of sensors if all paths crossing the width of the belt region intersect the sensing regions of at least k sensors. Barrier coverage can be achieved from a random initial deployment of mobile sensors by suitably relocating the sensors to form a barrier. Reducing the movement of the sensors is important in such scenarios due to the energy constr…
▽ More
A belt region is said to be k-barrier covered by a set of sensors if all paths crossing the width of the belt region intersect the sensing regions of at least k sensors. Barrier coverage can be achieved from a random initial deployment of mobile sensors by suitably relocating the sensors to form a barrier. Reducing the movement of the sensors is important in such scenarios due to the energy constraints of sensor devices. In this paper, we propose a centralized algorithm which achieves 1-barrier coverage by forming a non-linear barrier from a random initial deployment of sensors in a belt. The algorithm uses a novel idea of physical behavior of chains along with the concept of virtual force. Formation of non-linear barrier reduces the movement of the sensors needed as compared to linear barriers. Detailed simulation results are presented to show that the proposed algorithm achieves barrier coverage with less movement of sensors compared to other existing algorithms in the literature.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.