CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization

Mao, Qianren; Wang, Jiazheng; Wang, Zheng; Li, Xi; Li, Bo; Li, Jianxin

Computer Science > Artificial Intelligence

arXiv:2105.14201 (cs)

[Submitted on 29 May 2021 (v1), last revised 15 Nov 2023 (this version, v2)]

Title:CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization

Authors:Qianren Mao, Jiazheng Wang, Zheng Wang, Xi Li, Bo Li, Jianxin Li

View PDF

Abstract:Timeline summarization (TLS) involves creating summaries of long-running events using dated summaries from numerous news articles. However, limited data availability has significantly slowed down the development of timeline summarization. In this paper, we introduce the CNTLS dataset, a versatile resource for Chinese timeline summarization. CNTLS encompasses 77 real-life topics, each with 2524 documents and summarizes nearly 60\% days duration compression on average all topics.
We meticulously analyze the corpus using well-known metrics, focusing on the style of the summaries and the complexity of the summarization task. Specifically, we evaluate the performance of various extractive and generative summarization systems on the CNTLS corpus to provide benchmarks and support further research. To the best of our knowledge, CNTLS is the first Chinese timeline summarization dataset. The dataset and source code are released\footnote{Code and data available at: \emph{\url{this https URL}}.}.

Subjects:	Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2105.14201 [cs.AI]
	(or arXiv:2105.14201v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2105.14201

Submission history

From: Qianren Mao [view email]
[v1] Sat, 29 May 2021 03:47:10 UTC (111 KB)
[v2] Wed, 15 Nov 2023 09:52:12 UTC (6,549 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2021-05

Change to browse by:

cs
cs.AI
cs.IR

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xi Li
Hao Peng
Jianxin Li
Zheng Wang

export BibTeX citation

Computer Science > Artificial Intelligence

Title:CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:CNTLS: A Benchmark Dataset for Abstractive or Extractive Chinese Timeline Summarization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators