Skip to main content

Showing 1–4 of 4 results for author: Kaliska, A

.
  1. arXiv:2204.07775  [pdf, other

    cs.CL cs.AI cs.LG

    TASTEset -- Recipe Dataset and Food Entities Recognition Benchmark

    Authors: Ania Wróblewska, Agnieszka Kaliska, Maciej Pawłowski, Dawid Wiśniewski, Witold Sosnowski, Agnieszka Ławrynowicz

    Abstract: Food Computing is currently a fast-growing field of research. Natural language processing (NLP) is also increasingly essential in this field, especially for recognising food entities. However, there are still only a few well-defined tasks that serve as benchmarks for solutions in this area. We introduce a new dataset -- called \textit{TASTEset} -- to bridge this gap. In this dataset, Named Entity… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

  2. Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

    Authors: Tomasz Stanisławek, Filip Graliński, Anna Wróblewska, Dawid Lipiński, Agnieszka Kaliska, Paulina Rosalska, Bartosz Topolski, Przemysław Biecek

    Abstract: The relevance of the Key Information Extraction (KIE) task is increasingly important in natural language processing problems. But there are still only a few well-defined problems that serve as benchmarks for solutions in this area. To bridge this gap, we introduce two new datasets (Kleister NDA and Kleister Charity). They involve a mix of scanned and born-digital long formal English-language docum… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: accepted to ICDAR 2021

    Journal ref: International Conference on Document Analysis and Recognition ICDAR 2021

  3. arXiv:2003.02356  [pdf, other

    cs.CL

    Kleister: A novel task for Information Extraction involving Long Documents with Complex Layout

    Authors: Filip Graliński, Tomasz Stanisławek, Anna Wróblewska, Dawid Lipiński, Agnieszka Kaliska, Paulina Rosalska, Bartosz Topolski, Przemysław Biecek

    Abstract: State-of-the-art solutions for Natural Language Processing (NLP) are able to capture a broad range of contexts, like the sentence-level context or document-level context for short documents. But these solutions are still struggling when it comes to longer, real-world documents with the information encoded in the spatial structure of the document, such as page elements like tables, forms, headers,… ▽ More

    Submitted 6 March, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  4. arXiv:1911.03911  [pdf, other

    cs.CL

    Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines

    Authors: Łukasz Borchmann, Dawid Wiśniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński

    Abstract: We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single… ▽ More

    Submitted 8 October, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: Submitted to Findings of EMNLP