-
arXiv:2010.09657 [pdf, ps, other]
PySBD: Pragmatic Sentence Boundary Disambiguation
Abstract: In this paper, we present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical sentences even when the format and domain of the input text is unknown. In our work, we adapt the Golden Rules Set (a language-specific set of sentence boundary exemplars) originally implemented as a rub… ▽ More
Submitted 19 October, 2020; originally announced October 2020.
Comments: 'PySBD: Pragmatic Sentence Boundary Disambiguation' is a short paper (5 Pages with references) accepted into 2nd Workshop for Natural Language Processing Open Source Software (NLP-OSS) at EMNLP 2020 happening on 19 Nov 2020