Search | arXiv e-print repository

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

Authors: Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, Edoardo M. Ponti, Siva Reddy

Abstract: The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinat… ▽ More The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging. △ Less

Submitted 23 October, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: TACL 2022 (20 pages, 3 figures, 10 tables)

arXiv:2204.07931 [pdf, other]

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

Authors: Nouha Dziri, Sivan Milton, Mo Yu, Osmar Zaiane, Siva Reddy

Abstract: Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state… ▽ More Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of >60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations. Our findings raise important questions on the quality of existing datasets and models trained using them. We make our annotations publicly available for future research. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: NAACL 2022, 14 pages

arXiv:1612.02482 [pdf, other]

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

Authors: Krupakar Hans, R S Milton

Abstract: The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil languag… ▽ More The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil language pair was selected for this analysis. First, the use of Word2Vec embedding for both the English and Tamil words improved the translation results by 0.73 BLEU points over the baseline RNNSearch model with 4.84 BLEU score. The use of morphological segmentation before word vectorization to split the morphologically rich Tamil words into their respective morphemes before the translation, caused a reduction in the target vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the performance of neural machine translation by 7.05 BLEU points over the RNNSearch model used over the same corpus. Since the BLEU evaluation of the RNNMorph model might be unreliable due to an increase in the number of matching tokens per sentence, the performances of the translations were also compared by means of human evaluation metrics of adequacy, fluency and relative ranking. Further, the use of morphological segmentation also improved the efficacy of the attention mechanism. △ Less

Submitted 8 January, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: 21 pages, 11 figures, 2 tables, Corrected typos

arXiv:1606.03543 [pdf]

Using stories to bridge the chasm between perspectives: How metaphors and genres are used to share meaning

Authors: Emily Keen, Simon Milton, Rachelle Bosua

Abstract: Natural language, although complex in structure, contains considerable detail. All instances of language serve the purpose of making sense of experience and the intent of actors. Language conveys actor's personal reference to goals, responsibility, and values. In this paper we consider how actors from distinct perspectives communicate when they have share common goals. We have observed the plannin… ▽ More Natural language, although complex in structure, contains considerable detail. All instances of language serve the purpose of making sense of experience and the intent of actors. Language conveys actor's personal reference to goals, responsibility, and values. In this paper we consider how actors from distinct perspectives communicate when they have share common goals. We have observed the planning of a community and cultural event where actors are variously responsible for management and for artistic merit. Specifically, we consider actor's use of language as a tool to span perspectives and how functional discourse analysis tools and techniques enable a deeper interpretive understanding of the layers of discourse when derived from a rich context. We will also illustrate patterns we have found in the use of discourse and actor's ability to bridge the reasoning and logical gap between distinct perspectives through actor's reference to metaphors and genres. △ Less

Submitted 10 June, 2016; originally announced June 2016.

Comments: Research-in-progress ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

Report number: ACIS/2015/226

Showing 1–4 of 4 results for author: Milton, S