Generative Pretraining for Paraphrase Evaluation

Weston, Jack; Lenain, Raphael; Meepegama, Udeepa; Fristed, Emil

Computer Science > Computation and Language

arXiv:2107.08251v1 (cs)

[Submitted on 17 Jul 2021 (this version), latest version 24 Jul 2021 (v2)]

Title:Generative Pretraining for Paraphrase Evaluation

Authors:Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

View PDF

Abstract:We introduce ParaBLEU, a paraphrase representation learning model and evaluation metric for text generation. Unlike previous approaches, ParaBLEU learns to understand paraphrasis using generative conditioning as a pretraining objective. ParaBLEU correlates more strongly with human judgements than existing metrics, obtaining new state-of-the-art results on the 2017 WMT Metrics Shared Task. We show that our model is robust to data scarcity, exceeding previous state-of-the-art performance using only $50\%$ of the available training data and surpassing BLEU, ROUGE and METEOR with only $40$ labelled examples. Finally, we demonstrate that ParaBLEU can be used to conditionally generate novel paraphrases from a single demonstration, which we use to confirm our hypothesis that it learns abstract, generalized paraphrase representations.

Comments:	Under review
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2107.08251 [cs.CL]
	(or arXiv:2107.08251v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2107.08251

Submission history

From: Jack Weston [view email]
[v1] Sat, 17 Jul 2021 14:48:48 UTC (502 KB)
[v2] Sat, 24 Jul 2021 10:48:49 UTC (502 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-07

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Udeepa Meepegama

export BibTeX citation

Computer Science > Computation and Language

Title:Generative Pretraining for Paraphrase Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Generative Pretraining for Paraphrase Evaluation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators