Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Shi, Xian; Chen, Yanni; Zhang, Shiliang; Yan, Zhijie

Computer Science > Sound

arXiv:2301.12343 (cs)

[Submitted on 29 Jan 2023]

Title:Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Authors:Xian Shi, Yanni Chen, Shiliang Zhang, Zhijie Yan

View PDF

Abstract:Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment~(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability. This paper proposes to perform timestamp prediction~(TP) while recognizing by utilizing continuous integrate-and-fire~(CIF) mechanism in non-autoregressive ASR model - Paraformer. Foucing on the fire place bias issue of CIF, we conduct post-processing strategies including fire-delay and silence insertion. Besides, we propose to use scaled-CIF to smooth the weights of CIF output, which is proved beneficial for both ASR and TP task. Accumulated averaging shift~(AAS) and diarization error rate~(DER) are adopted to measure the quality of timestamps and we compare these metrics of proposed system and conventional hybrid force-alignment system. The experiment results over manually-marked timestamps testset show that the proposed optimization methods significantly improve the accuracy of CIF timestamps, reducing 66.7\% and 82.1\% of AAS and DER respectively. Comparing to Kaldi force-alignment trained with the same data, optimized CIF timestamps achieved 12.3\% relative AAS reduction.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2301.12343 [cs.SD]
	(or arXiv:2301.12343v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2301.12343

Submission history

From: Xian Shi [view email]
[v1] Sun, 29 Jan 2023 03:47:59 UTC (2,232 KB)

Computer Science > Sound

Title:Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators