Skip to main content

Showing 1–6 of 6 results for author: Jurkiewicz, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.08455  [pdf, other

    cs.CV cs.CL cs.LG

    Document Understanding Dataset and Evaluation (DUDE)

    Authors: Jordy Van Landeghem, Rubén Tito, Łukasz Borchmann, Michał Pietruszka, Paweł Józiak, Rafał Powalski, Dawid Jurkiewicz, Mickaël Coustaty, Bertrand Ackaert, Ernest Valveny, Matthew Blaschko, Sien Moens, Tomasz Stanisławek

    Abstract: We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document… ▽ More

    Submitted 11 September, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: Accepted at ICCV 2023

  2. arXiv:2206.04045  [pdf, other

    cs.CL cs.LG

    STable: Table Generation Framework for Encoder-Decoder Models

    Authors: Michał Pietruszka, Michał Turski, Łukasz Borchmann, Tomasz Dwojak, Gabriela Pałka, Karolina Szyndler, Dawid Jurkiewicz, Łukasz Garncarek

    Abstract: The output structure of database-like tables, consisting of values structured in horizontal rows and vertical columns identifiable by name, can cover a wide range of NLP tasks. Following this constatation, we propose a framework for text-to-table neural models applicable to problems such as extraction of line items, joint entity and relation extraction, or knowledge base population. The permutatio… ▽ More

    Submitted 12 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  3. arXiv:2102.09550  [pdf, other

    cs.CL cs.LG

    Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

    Authors: Rafał Powalski, Łukasz Borchmann, Dawid Jurkiewicz, Tomasz Dwojak, Michał Pietruszka, Gabriela Pałka

    Abstract: We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable of unifying a variety of problems involving natural language. The layout is represented as an attenti… ▽ More

    Submitted 12 July, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: Accepted at ICDAR 2021

  4. arXiv:2010.14464  [pdf, other

    cs.DS cs.CL cs.IR

    Dynamic Boundary Time War** for Sub-sequence Matching with Few Examples

    Authors: Łukasz Borchmann, Dawid Jurkiewicz, Filip Graliński, Tomasz Górecki

    Abstract: The paper presents a novel method of finding a fragment in a long temporal sequence similar to the set of shorter sequences. We are the first to propose an algorithm for such a search that does not rely on computing the average sequence from query examples. Instead, we use query examples as is, utilizing all of them simultaneously. The introduced method based on the Dynamic Time War** (DTW) tech… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

  5. arXiv:2005.07934  [pdf, other

    cs.CL

    ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them

    Authors: Dawid Jurkiewicz, Łukasz Borchmann, Izabela Kosmala, Filip Graliński

    Abstract: This paper presents the winning system for the propaganda Technique Classification (TC) task and the second-placed system for the propaganda Span Identification (SI) task. The purpose of TC task was to identify an applied propaganda technique given propaganda text fragment. The goal of SI task was to find specific text fragments which contain at least one propaganda technique. Both of the develope… ▽ More

    Submitted 5 September, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

  6. arXiv:1911.03911  [pdf, other

    cs.CL

    Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines

    Authors: Łukasz Borchmann, Dawid Wiśniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński

    Abstract: We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single… ▽ More

    Submitted 8 October, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: Submitted to Findings of EMNLP