Evaluation of Neural Network Classification Systems on Document Stream

Voerman, Joris; Joseph, Aurelie; Coustaty, Mickael; Andecy, Vincent Poulain d; Ogier, Jean-Marc

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.07547 (cs)

[Submitted on 15 Jul 2020]

Title:Evaluation of Neural Network Classification Systems on Document Stream

Authors:Joris Voerman, Aurelie Joseph, Mickael Coustaty, Vincent Poulain d Andecy, Jean-Marc Ogier

View PDF

Abstract:One major drawback of state of the art Neural Networks (NN)-based approaches for document classification purposes is the large number of training samples required to obtain an efficient classification. The minimum required number is around one thousand annotated documents for each class. In many cases it is very difficult, if not impossible, to gather this number of samples in real industrial processes. In this paper, we analyse the efficiency of NN-based document classification systems in a sub-optimal training case, based on the situation of a company document stream. We evaluated three different approaches, one based on image content and two on textual content. The evaluation was divided into four parts: a reference case, to assess the performance of the system in the lab; two cases that each simulate a specific difficulty linked to document stream processing; and a realistic case that combined all of these difficulties. The realistic case highlighted the fact that there is a significant drop in the efficiency of NN-Based document classification systems. Although they remain efficient for well represented classes (with an over-fitting of the system for those classes), it is impossible for them to handle appropriately less well represented classes. NN-Based document classification systems need to be adapted to resolve these two problems before they can be considered for use in a company document stream.

Comments:	15 pages, 3 figures and submitted to DAS conferences 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
ACM classes:	I.7.1; J.1
Cite as:	arXiv:2007.07547 [cs.CV]
	(or arXiv:2007.07547v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.07547

Submission history

From: Joris Voerman [view email]
[v1] Wed, 15 Jul 2020 08:52:39 UTC (170 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Evaluation of Neural Network Classification Systems on Document Stream

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Evaluation of Neural Network Classification Systems on Document Stream

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators