Skip to main content

Showing 1–12 of 12 results for author: Souibgui, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.19031  [pdf, other

    cs.CV cs.AI

    Machine Unlearning for Document Classification

    Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to ICDAR2024

  2. arXiv:2312.10108  [pdf, other

    cs.CV cs.AI cs.LG

    Privacy-Aware Document Visual Question Answering

    Authors: Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

    Abstract: Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees. In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used fo… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2303.09347   

    cs.CV

    CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition

    Authors: Marwa Dhiaf, Mohamed Ali Souibgui, Kai Wang, Yuyang Liu, Yousri Kessentini, Alicia Fornés, Ahmed Cheikh Rouhou

    Abstract: Self-supervised learning has recently emerged as a strong alternative in document analysis. These approaches are now capable of learning high-quality image representations and overcoming the limitations of supervised methods, which require a large amount of labeled data. However, these methods are unable to capture new knowledge in an incremental fashion, where data is presented to the model seque… ▽ More

    Submitted 26 April, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Due to current company policy constraints, we are compelled to withdraw our paper. The organization's guidelines prohibit us from proceeding with the publication of this work at this time. We apologize for any inconvenience this may cause and appreciate your understanding in this matter

  4. arXiv:2303.03127  [pdf, other

    cs.CV

    ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

    Authors: Sana Khamekhem Jemni, Sourour Ammar, Mohamed Ali Souibgui, Yousri Kessentini, Abbas Cheddad

    Abstract: Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods are relying on machine learning techniques that require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpus for training. To handle the data scarcity issue, we in… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  5. arXiv:2209.10441  [pdf, other

    cs.CV

    A Few Shot Multi-Representation Approach for N-gram Spotting in Historical Manuscripts

    Authors: Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma Bensalah, Josep Lladós, Alicia Fornés, Angelo Marcelli

    Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted in ICFHR 2022

  6. arXiv:2203.04814  [pdf, other

    cs.CV

    Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

    Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. E… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Preprint

  7. arXiv:2201.10252  [pdf, other

    cs.CV

    DocEnTr: An End-to-End Document Image Enhancement Transformer

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal

    Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: submitted to ICPR 2022

  8. Few Shots Are All You Need: A Progressive Few Shot Learning Approach for Low Resource Handwritten Text Recognition

    Authors: Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini, Beáta Megyesi

    Abstract: Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. The main difficulty comes from the very few annotated data and the limited linguistic information (e.g. dictionaries and language models). Thus, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human labor annotation process,… ▽ More

    Submitted 13 June, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Accepted in Pattern Recognition Letters

  9. Enhance to Read Better: A Multi-Task Adversarial Network for Handwritten Document Image Enhancement

    Authors: Sana Khamekhem Jemni, Mohamed Ali Souibgui, Yousri Kessentini, Alicia Fornés

    Abstract: Handwritten document images can be highly affected by degradation for different reasons: Paper ageing, daily-life scenarios (wrinkles, dust, etc.), bad scanning process and so on. These artifacts raise many readability issues for current Handwritten Text Recognition (HTR) algorithms and severely devalue their efficiency. In this paper, we propose an end to end architecture based on Generative Adve… ▽ More

    Submitted 22 October, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted in Pattern Recognition

  10. arXiv:2105.05300  [pdf, other

    cs.CV

    One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

    Authors: Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

    Abstract: Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). For example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the message contents. Thus, in this paper we address this problem through a data generation technique… ▽ More

    Submitted 5 October, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted in WACV 2022

  11. DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement

    Authors: Mohamed Ali Souibgui, Yousri Kessentini

    Abstract: Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this prac… ▽ More

    Submitted 17 October, 2020; originally announced October 2020.

    Comments: Accepted in IEEE TPAMI

  12. arXiv:2009.12577  [pdf, other

    cs.CV

    A Few-shot Learning Approach for Historical Ciphered Manuscript Recognition

    Authors: Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini, Crina Tudor

    Abstract: Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficultie… ▽ More

    Submitted 26 September, 2020; originally announced September 2020.

    Comments: Accepted in the 25th International Conference on Pattern Recognition (ICPR2020), Milan, Italy 10 - 15 January 2021 (Camera Ready Version)