Skip to main content

Showing 1–4 of 4 results for author: Turski, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.14953  [pdf, other

    cs.CL

    CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data

    Authors: Michał Turski, Tomasz Stanisławek, Karol Kaczmarek, Paweł Dyda, Filip Graliński

    Abstract: In recent years, the field of document understanding has progressed a lot. A significant part of this progress has been possible thanks to the use of language models pretrained on large amounts of documents. However, pretraining corpora used in the domain of document understanding are single domain, monolingual, or nonpublic. Our goal in this paper is to propose an efficient pipeline for creating… ▽ More

    Submitted 6 June, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: Accepted at ICDAR 2023

  2. arXiv:2304.04923  [pdf, other

    cs.RO

    Staged Contact Optimization: Combining Contact-Implicit and Multi-Phase Hybrid Trajectory Optimization

    Authors: Michael R. Turski, Joseph Norby, Aaron M. Johnson

    Abstract: Trajectory optimization problems for legged robots are commonly formulated with fixed contact schedules. These multi-phase Hybrid Trajectory Optimization (HTO) methods result in locally optimal trajectories, but the result depends heavily upon the predefined contact mode sequence. Contact-Implicit Optimization (CIO) offers a potential solution to this issue by allowing the contact mode to be deter… ▽ More

    Submitted 17 September, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Published at the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

  3. arXiv:2206.04045  [pdf, other

    cs.CL cs.LG

    STable: Table Generation Framework for Encoder-Decoder Models

    Authors: Michał Pietruszka, Michał Turski, Łukasz Borchmann, Tomasz Dwojak, Gabriela Pałka, Karolina Szyndler, Dawid Jurkiewicz, Łukasz Garncarek

    Abstract: The output structure of database-like tables, consisting of values structured in horizontal rows and vertical columns identifiable by name, can cover a wide range of NLP tasks. Following this constatation, we propose a framework for text-to-table neural models applicable to problems such as extraction of line items, joint entity and relation extraction, or knowledge base population. The permutatio… ▽ More

    Submitted 12 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  4. LAMBERT: Layout-Aware (Language) Modeling for information extraction

    Authors: Łukasz Garncarek, Rafał Powalski, Tomasz Stanisławek, Bartosz Topolski, Piotr Halama, Michał Turski, Filip Graliński

    Abstract: We introduce a simple new approach to the problem of understanding documents where non-trivial layout influences the local semantics. To this end, we modify the Transformer encoder architecture in a way that allows it to use layout features obtained from an OCR system, without the need to re-learn language semantics from scratch. We only augment the input of the model with the coordinates of token… ▽ More

    Submitted 28 May, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: accepted to ICDAR 2021

    Journal ref: In: Lladós J., Lopresti D., Uchida S. (eds) Document Analysis and Recognition - ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science, vol 12821. Springer, Cham