-
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Authors:
Nouha Dziri,
Ehsan Kamalloo,
Sivan Milton,
Osmar Zaiane,
Mo Yu,
Edoardo M. Ponti,
Siva Reddy
Abstract:
The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinat…
▽ More
The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.
△ Less
Submitted 23 October, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Authors:
Nouha Dziri,
Sivan Milton,
Mo Yu,
Osmar Zaiane,
Siva Reddy
Abstract:
Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state…
▽ More
Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of >60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations. Our findings raise important questions on the quality of existing datasets and models trained using them. We make our annotations publicly available for future research.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages
Authors:
Krupakar Hans,
R S Milton
Abstract:
The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil languag…
▽ More
The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil language pair was selected for this analysis. First, the use of Word2Vec embedding for both the English and Tamil words improved the translation results by 0.73 BLEU points over the baseline RNNSearch model with 4.84 BLEU score. The use of morphological segmentation before word vectorization to split the morphologically rich Tamil words into their respective morphemes before the translation, caused a reduction in the target vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the performance of neural machine translation by 7.05 BLEU points over the RNNSearch model used over the same corpus. Since the BLEU evaluation of the RNNMorph model might be unreliable due to an increase in the number of matching tokens per sentence, the performances of the translations were also compared by means of human evaluation metrics of adequacy, fluency and relative ranking. Further, the use of morphological segmentation also improved the efficacy of the attention mechanism.
△ Less
Submitted 8 January, 2017; v1 submitted 7 December, 2016;
originally announced December 2016.
-
Using stories to bridge the chasm between perspectives: How metaphors and genres are used to share meaning
Authors:
Emily Keen,
Simon Milton,
Rachelle Bosua
Abstract:
Natural language, although complex in structure, contains considerable detail. All instances of language serve the purpose of making sense of experience and the intent of actors. Language conveys actor's personal reference to goals, responsibility, and values. In this paper we consider how actors from distinct perspectives communicate when they have share common goals. We have observed the plannin…
▽ More
Natural language, although complex in structure, contains considerable detail. All instances of language serve the purpose of making sense of experience and the intent of actors. Language conveys actor's personal reference to goals, responsibility, and values. In this paper we consider how actors from distinct perspectives communicate when they have share common goals. We have observed the planning of a community and cultural event where actors are variously responsible for management and for artistic merit.
Specifically, we consider actor's use of language as a tool to span perspectives and how functional discourse analysis tools and techniques enable a deeper interpretive understanding of the layers of discourse when derived from a rich context. We will also illustrate patterns we have found in the use of discourse and actor's ability to bridge the reasoning and logical gap between distinct perspectives through actor's reference to metaphors and genres.
△ Less
Submitted 10 June, 2016;
originally announced June 2016.