Skip to main content

Showing 1–44 of 44 results for author: Lladós

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08610  [pdf, other

    cs.CV cs.AI cs.LG

    LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach

    Authors: Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas

    Abstract: The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (D… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ICDAR 2024 (Athens, Greece) Workshop on Automatically Domain-Adapted and Personalized Document Analysis (ADAPDA)

  2. arXiv:2406.08354  [pdf, other

    cs.CV cs.AI cs.LG

    DocSynthv2: A Practical Autoregressive Modeling for Document Generation

    Authors: Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

    Abstract: While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both la… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Spotlight (Oral) Acceptance to CVPR 2024 Workshop for Graphic Design Understanding and Generation (GDUG)

  3. arXiv:2406.08226  [pdf, other

    cs.CV cs.AI cs.LG

    DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

    Authors: Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko, Marie-Francine Moens, Josep Lladós, Sanket Biswas

    Abstract: This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ICDAR 2024 (Athens, Greece)

  4. arXiv:2406.07315  [pdf, other

    cs.IR cs.CV

    Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval

    Authors: Adrià Molina, Oriol Ramos Terrades, Josep Lladós

    Abstract: This paper introduces Fetch-A-Set (FAS), a comprehensive benchmark tailored for legislative historical document analysis systems, addressing the challenges of large-scale document retrieval in historical contexts. The benchmark comprises a vast repository of documents dating back to the XVII century, serving both as a training resource and an evaluation benchmark for retrieval systems. It fills a… ▽ More

    Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Preprint for the manuscript accepted for publication in the DAS2024 LNCS proceedings

  5. arXiv:2405.03104  [pdf, other

    cs.CV

    GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding

    Authors: Nil Biescas, Carlos Boned, Josep Lladós, Sanket Biswas

    Abstract: This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features. We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework, demonstrat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024 (Athens, Greece)

  6. arXiv:2405.03099  [pdf, other

    cs.CV

    SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition

    Authors: Adarsh Tiwari, Sanket Biswas, Josep Lladós

    Abstract: We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition. By map** complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling. SketchGPT leverages the next token prediction ob… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024

  7. arXiv:2404.00412  [pdf, other

    cs.CV cs.LG

    SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

    Authors: Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta

    Abstract: Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  8. arXiv:2402.11401  [pdf, other

    cs.CV cs.LG

    GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constr… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  9. I Can't Believe It's Not Better: In-air Movement For Alzheimer Handwriting Synthetic Generation

    Authors: Asma Bensalah, Antonio Parziale, Giuseppe De Gregorio, Angelo Marcelli, Alicia Fornés, Lladós

    Abstract: During recent years, there here has been a boom in terms of deep learning use for handwriting analysis and recognition. One main application for handwriting analysis is early detection and diagnosis in the health field. Unfortunately, most real case problems still suffer a scarcity of data, which makes difficult the use of deep learning-based models. To alleviate this problem, some works resort to… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  10. arXiv:2310.00917  [pdf, other

    cs.CV

    Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

    Authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya

    Abstract: The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  11. arXiv:2310.00558  [pdf, other

    cs.CV

    Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

    Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós

    Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of d… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024

  12. arXiv:2309.05756  [pdf, other

    cs.CV

    TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

    Authors: Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós

    Abstract: The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on O… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Preprint to Pattern Recognition

  13. SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed qu… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ICDAR 2023 (San Jose, California)

  14. SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

    Authors: Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal

    Abstract: Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal… ▽ More

    Submitted 20 August, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

    Journal ref: ICDAR 2023 (International Conference on Document Analysis and Recognition) Lecture Notes in Computer Science, vol 14187, pp. 342-360. Springer Nature

  15. Easing Automatic Neurorehabilitation via Classification and Smoothness Analysis

    Authors: Asma Bensalah, Alicia Fornés, Cristina Carmona-Duarte, Josep Lladós

    Abstract: Assessing the quality of movements for post-stroke patients during the rehabilitation phase is vital given that there is no standard stroke rehabilitation plan for all the patients. In fact, it depends basically on the patient's functional independence and its progress along the rehabilitation sessions. To tackle this challenge and make neurorehabilitation more agile, we propose an automatic asses… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  16. The RPM3D project: 3D Kinematics for Remote Patient Monitoring

    Authors: Alicia Fornés, Asma Bensalah, Cristina Carmona-Duarte, Jialuo Chen, Miguel A. Ferrer, Andreas Fischer, Josep Lladós, Cristina Martín, Eloy Opisso, Réjean Plamondon, Anna Scius-Bertrand, Josep Maria Tormos

    Abstract: This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute5 (neurorehabilitation hospital), showing promising results. Our work could have a great im… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  17. Towards Stroke Patients' Upper-limb Automatic Motor Assessment Using Smartwatches

    Authors: Asma Bensalah, Jialuo Chen, Alicia Fornés, Cristina Carmona-Duarte, Josep Lladós, Miguel A. Ferrer

    Abstract: Assessing the physical condition in rehabilitation scenarios is a challenging problem, since it involves Human Activity Recognition (HAR) and kinematic analysis methods. In addition, the difficulties increase in unconstrained rehabilitation scenarios, which are much closer to the real use cases. In particular, our aim is to design an upper-limb assessment pipeline for stroke patients using smartwa… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  18. arXiv:2212.00554  [pdf, other

    cs.LG cs.AI

    Early prediction of the risk of ICU mortality with Deep Federated Learning

    Authors: Korbinian Randl, Núria Lladós Armengol, Lena Mondrejevski, Ioanna Miliou

    Abstract: Intensive Care Units usually carry patients with a serious risk of mortality. Recent research has shown the ability of Machine Learning to indicate the patients' mortality risk and point physicians toward individuals with a heightened need for care. Nevertheless, healthcare data is often subject to privacy regulations and can therefore not be easily shared in order to build Centralized Machine Lea… ▽ More

    Submitted 5 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: 12 pages

  19. arXiv:2209.10441  [pdf, other

    cs.CV

    A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

    Authors: Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma Bensalah, Josep Lladós, Alicia Fornés, Angelo Marcelli

    Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted in ICFHR 2022

  20. Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

    Authors: Andrea Gemelli, Sanket Biswas, Enrico Civitelli, Josep Lladós, Simone Marinai

    Abstract: Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-drive… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: TiE ECCV 2022

    Journal ref: Computer Vision -- ECCV 2022 Workshops, vol. 13804, 2023, pp. 329-344

  21. arXiv:2204.04028  [pdf, other

    cs.CV cs.DL cs.IR

    A Generic Image Retrieval Method for Date Estimation of Historical Document Collections

    Authors: Adrià Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Lladós

    Abstract: Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. we use a ranking loss function named smooth-nDCG to train a Convolutional… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Preprint of paper accepted at DAS2022

  22. arXiv:2203.04814  [pdf, other

    cs.CV

    Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

    Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. E… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Preprint

  23. arXiv:2201.11438  [pdf, other

    cs.CV

    DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

    Authors: Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal

    Abstract: Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects (title, sections, figures etc.) has emerged as an interesting problem fo… ▽ More

    Submitted 21 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Preprint

  24. arXiv:2201.10252  [pdf, other

    cs.CV

    DocEnTr: An End-to-End Document Image Enhancement Transformer

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal

    Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: submitted to ICPR 2022

  25. arXiv:2109.11874  [pdf, other

    cs.CV cs.GR

    Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild

    Authors: Pau Riba, Sounak Dey, Ali Furkan Biten, Josep Llados

    Abstract: This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: Under Review

  26. arXiv:2107.04357  [pdf, other

    cs.CV cs.LG

    Graph-based Deep Generative Modelling for Document Layout Generation

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR Workshops-GLESDO 2021

  27. arXiv:2107.02638  [pdf, other

    cs.CV

    DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR 2021

  28. arXiv:2106.05618  [pdf, other

    cs.CV

    Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach

    Authors: Adrià Molina, Pau Riba, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós

    Abstract: This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at ICDAR 2021

  29. arXiv:2106.05144  [pdf, other

    cs.CV cs.IR

    Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting

    Authors: Pau Riba, Adrià Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós

    Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the qu… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted at ICDAR 2021

  30. arXiv:2105.05300  [pdf, other

    cs.CV

    One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

    Authors: Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

    Abstract: Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). For example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the message contents. Thus, in this paper we address this problem through a data generation technique… ▽ More

    Submitted 5 October, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted in WACV 2022

  31. arXiv:2008.07641  [pdf, other

    cs.CV cs.LG

    Learning Graph Edit Distance by Graph Neural Networks

    Authors: Pau Riba, Andreas Fischer, Josep Lladós, Alicia Fornés

    Abstract: The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine the advances on deep metric learning with traditional approximations of the graph edit distance. Hence, we propose an efficient graph distance based on the nov… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  32. arXiv:1912.10016  [pdf, other

    cs.CV

    A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

    Authors: Manuel Carbonell, Alicia Fornés, Mauricio Villegas, Josep Lladós

    Abstract: In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propo… ▽ More

    Submitted 4 May, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: To be published in Pattern Recognition Letters

  33. arXiv:1910.08993  [pdf, other

    cs.CV

    Identity Document and banknote security forensics: a survey

    Authors: Albert Berenguel Centeno, Oriol Ramos Terrades, Josep Lladós Canet, Cristina Cañero Morales

    Abstract: Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. Banknotes and identity documents are two common objects of counterfeiting. Aiming to detect these counterfeits, the present survey covers a wide range of anti-counterfeiting security features, categorizing them into three components: security substrate, security inks and security printing. respectively. F… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: 35 pages, 5 figures, 14 tables

  34. Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval

    Authors: Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, Yi-Zhe Song

    Abstract: In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected c… ▽ More

    Submitted 19 August, 2020; v1 submitted 6 April, 2019; originally announced April 2019.

    Comments: Oral paper in CVPR 2019

    Journal ref: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  35. arXiv:1807.02839  [pdf, other

    cs.CV cs.LG stat.ML

    Hierarchical stochastic graphlet embedding for graph-based pattern recognition

    Authors: Anjan Dutta, Pau Riba, Josep Lladós, Alicia Fornés

    Abstract: Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable because of the lack of mathematical operations defined in graph domain. Graph embedding, which maps graphs to a vectorial space, has been proposed as a way to tackle these difficulties enabling the use of standard machine learning techniques. However, it is well known… ▽ More

    Submitted 4 January, 2020; v1 submitted 8 July, 2018; originally announced July 2018.

    Comments: In Neural Computing and Applications (17 pages, 5 figures, 6 tables)

  36. arXiv:1804.10819  [pdf, other

    cs.CV

    Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

    Authors: Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal

    Abstract: In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is… ▽ More

    Submitted 28 April, 2018; originally announced April 2018.

    Comments: Accepted at ICPR 2018

  37. arXiv:1803.06252  [pdf, other

    cs.CV cs.CL

    Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

    Authors: Manuel Carbonell, Mauricio Villegas, Alicia Fornés, Josep Lladós

    Abstract: When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognitio… ▽ More

    Submitted 22 March, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: To appear in IAPR International Workshop on Document Analysis Systems 2018 (DAS 2018)

  38. arXiv:1801.07138  [pdf

    cs.SI cs.CY

    Wikipedia in academia as a teaching tool: from averse to proactive faculty profiles

    Authors: Julià Minguillón, Eduard Aibar, Maura Lerga, Josep Lladós, Antoni Meseguer-Artola

    Abstract: This study concerned the active use of Wikipedia as a teaching tool in the classroom in higher education, trying to identify different usage profiles and their characterization. A questionnaire survey was administrated to all full-time and part-time teachers at the Universitat Oberta de Catalunya and the Universitat Pompeu Fabra, both in Barcelona, Spain. The questionnaire was designed using the T… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

    Comments: 16 pages, 1 figure, 8 tables

    ACM Class: K.3.1; H.3.5

  39. arXiv:1708.06126  [pdf, other

    cs.CV

    e-Counterfeit: a mobile-server platform for document counterfeit detection

    Authors: Albert Berenguel, Oriol Ramos Terrades, Josep Lladós, Cristina Cañero

    Abstract: This paper presents a novel application to detect counterfeit identity documents forged by a scan-printing operation. Texture analysis approaches are proposed to extract validation features from security background that is usually printed in documents as IDs or banknotes. The main contribution of this work is the end-to-end mobile-server architecture, which provides a service for non-expert users… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

    Comments: 6 pages, 5 figures

  40. arXiv:1707.02131  [pdf, other

    cs.CV

    SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification

    Authors: Sounak Dey, Anjan Dutta, J. Ignacio Toledo, Suman K. Ghosh, Josep Llados, Umapada Pal

    Abstract: Offline signature verification is one of the most challenging tasks in biometrics and document forensics. Unlike other verification problems, it needs to model minute but critical details between genuine and forged signatures, because a skilled falsification might often resembles the real signature with small deformation. This verification task is even harder in writer independent scenarios which… ▽ More

    Submitted 30 September, 2017; v1 submitted 7 July, 2017; originally announced July 2017.

  41. arXiv:1702.00391  [pdf, other

    cs.CV

    Product Graph-based Higher Order Contextual Similarities for Inexact Subgraph Matching

    Authors: Anjan Dutta, Josep Lladós, Horst Bunke, Umapada Pal

    Abstract: Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the te… ▽ More

    Submitted 1 February, 2017; originally announced February 2017.

  42. arXiv:1604.06243  [pdf, other

    cs.CV

    Evaluation of the Effect of Improper Segmentation on Word Spotting

    Authors: Sounak Dey, Anguelos Nicolaou, Josep Llados, Umapada Pal

    Abstract: Word spotting is an important recognition task in historical document analysis. In most cases methods are developed and evaluated assuming perfect word segmentations. In this paper we propose an experimental framework to quantify the effect of goodness of word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generatin… ▽ More

    Submitted 21 April, 2016; originally announced April 2016.

  43. arXiv:1604.05907  [pdf, other

    cs.CV

    Local Binary Pattern for Word Spotting in Handwritten Historical Document

    Authors: Sounak Dey, Anguelos Nicolaou, Josep Llados, Umapada Pal

    Abstract: Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spot- ting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a… ▽ More

    Submitted 21 April, 2016; v1 submitted 20 April, 2016; originally announced April 2016.

  44. arXiv:1004.5427  [pdf

    cs.CV cs.GR

    Employing fuzzy intervals and loop-based methodology for designing structural signature: an application to symbol recognition

    Authors: Muhammad Muzzamil Luqman, Mathieu Delalandre, Thierry Brouard, Jean-Yves Ramel, Josep Lladós

    Abstract: Motivation of our work is to present a new methodology for symbol recognition. We support structural methods for representing visual associations in graphic documents. The proposed method employs a structural approach for symbol representation and a statistical classifier for recognition. We vectorize a graphic symbol, encode its topological and geometrical information by an ARG and compute a sign… ▽ More

    Submitted 29 April, 2010; originally announced April 2010.

    Comments: 10 pages, Eighth IAPR International Workshop on Graphics RECognition (GREC), 2009, volume 8, 22-31