Detection is the central problem in real-word spelling correction

Wilcox-O'Hearn, L. Amber

Computer Science > Computation and Language

arXiv:1408.3153 (cs)

[Submitted on 13 Aug 2014 (v1), last revised 15 Aug 2014 (this version, v2)]

Title:Detection is the central problem in real-word spelling correction

Authors:L. Amber Wilcox-O'Hearn

View PDF

Abstract:Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrained. As we demonstrate in this paper, merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models. Trigram models are sufficient in almost all cases. The difficulty comes when every word in the sentence is a potential error, with a large set of possible candidate corrections. Despite their strengths, trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure. The detection task exposes weakness not visible in the selection task.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1408.3153 [cs.CL]
	(or arXiv:1408.3153v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1408.3153

Submission history

From: L. Amber Wilcox-O'Hearn [view email]
[v1] Wed, 13 Aug 2014 22:09:23 UTC (23 KB)
[v2] Fri, 15 Aug 2014 15:06:38 UTC (23 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2014-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

L. Amber Wilcox-O'Hearn

export BibTeX citation

Computer Science > Computation and Language

Title:Detection is the central problem in real-word spelling correction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Detection is the central problem in real-word spelling correction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators