Showing 1–2 of 2 results for author: O'Cuinn, M

Search v0.5.6 released 2020-02-24

arXiv:2306.00750 [pdf, other]

cs.IR cs.AI cs.LG

End-to-End Document Classification and Key Information Extraction using Assignment Optimization

Authors: Ciaran Cooney, Joana Cavadas, Liam Madigan, Bradley Savage, Rachel Heyburn, Mairead O'Cuinn

Abstract: We propose end-to-end document classification and key information extraction (KIE) for automating document processing in forms. Through accurate document classification we harness known information from templates to enhance KIE from forms. We use text and layout encoding with a cosine similarity measure to classify visually-similar documents. We then demonstrate a novel application of mixed intege… ▽ More We propose end-to-end document classification and key information extraction (KIE) for automating document processing in forms. Through accurate document classification we harness known information from templates to enhance KIE from forms. We use text and layout encoding with a cosine similarity measure to classify visually-similar documents. We then demonstrate a novel application of mixed integer programming by using assignment optimization to extract key information from documents. Our approach is validated on an in-house dataset of noisy scanned forms. The best performing document classification approach achieved 0.97 f1 score. A mean f1 score of 0.94 for the KIE task suggests there is significant potential in applying optimization techniques. Abation results show that the method relies on document preprocessing techniques to mitigate Type II errors and achieve optimal performance. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 10 pages, 5 figures
arXiv:2211.06168 [pdf, other]

cs.CL

Unimodal and Multimodal Representation Training for Relation Extraction

Authors: Ciaran Cooney, Rachel Heyburn, Liam Madigan, Mairead O'Cuinn, Chloe Thompson, Joana Cavadas

Abstract: Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in whic… ▽ More Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 11 pages, 3 figures, 30th Irish Conference on Artificial Intelligence and Cognitive Science

Search v0.5.6 released 2020-02-24