SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Zhao, Yao; Saleh, Mohammad; Liu, Peter J.

Computer Science > Computation and Language

arXiv:2006.10213 (cs)

[Submitted on 18 Jun 2020]

Title:SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Authors:Yao Zhao, Mohammad Saleh, Peter J.Liu

View PDF

Abstract:Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens. We propose SEAL, a Transformer-based model, featuring a new encoder-decoder attention that dynamically extracts/selects input snippets to sparsely attend to for each output segment. Using only the original documents and summaries, we derive proxy labels that provide weak supervision for extractive layers simultaneously with regular supervision from abstractive summaries. The SEAL model achieves state-of-the-art results on existing long-form summarization tasks, and outperforms strong baseline models on a new dataset/task we introduce, Search2Wiki, with much longer input text. Since content selection is explicit in the SEAL model, a desirable side effect is that the selection can be inspected for enhanced interpretability.

Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2006.10213 [cs.CL]
	(or arXiv:2006.10213v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.10213

Submission history

From: Yao Zhao [view email]
[v1] Thu, 18 Jun 2020 00:13:21 UTC (248 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.IR
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yao Zhao
Mohammad Saleh
Peter J. Liu

export BibTeX citation

Computer Science > Computation and Language

Title:SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators