Skip to main content

Showing 1–4 of 4 results for author: Babkin, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.04003  [pdf, other

    cs.CL

    BuDDIE: A Business Document Dataset for Multi-task Information Extraction

    Authors: Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse ann… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2401.00908  [pdf, other

    cs.CL

    DocLLM: A layout-aware generative language model for multimodal document understanding

    Authors: Dongsheng Wang, Natraj Raman, Mathieu Sibue, Zhiqiang Ma, Petr Babkin, Simerjot Kaur, Yulong Pei, Armineh Nourbakhsh, Xiaomo Liu

    Abstract: Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: 16 pages, 4 figures

  3. arXiv:2305.18607  [pdf, other

    cs.SE cs.AI cs.CR

    How Effective Are Neural Networks for Fixing Security Vulnerabilities

    Authors: Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah

    Abstract: Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to stud… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: This paper was accepted in the proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023), and was presented at the conference, that was held in Seattle, USA, 17-21 July 2023

  4. arXiv:2212.12584  [pdf, other

    cs.CL

    Neural Transition-based Parsing of Library Deprecations

    Authors: Petr Babkin, Nacho Navarro, Salwa Alamir, Sameena Shah

    Abstract: This paper tackles the challenging problem of automating code updates to fix deprecated API usages of open source libraries by analyzing their release notes. Our system employs a three-tier architecture: first, a web crawler service retrieves deprecation documentation from the web; then a specially built parser processes those text documents into tree-structured representations; finally, a client… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

    Comments: 11 pages + references and appendix (14 total). This is an edited version of our rejected submission to ESEC/FSE 2022 to include a citation of our earlier short paper and remove all content pertaining to the demo paper submission currently under review for ICSE 2023