Skip to main content

Showing 1–50 of 72 results for author: Pal, U

Searching in archive cs. Search in all archives.
.
  1. MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification

    Authors: Miguel A. Ferrer, Abhijit Das, Moises Diaz, Aythami Morales, Cristina Carmona-Duarte, Umapada Pal

    Abstract: Script identification plays a vital role in applications that involve handwriting and document analysis within a multi-script and multi-lingual environment. Moreover, it exhibits a profound connection with human cognition. This paper provides a new database for benchmarking script identification algorithms, which contains both printed and handwritten documents collected from a wide variety of scri… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Journal ref: Cognitive Computation, Volume 16, pages 131 to 157,(2024)

  2. arXiv:2404.00412  [pdf, other

    cs.CV cs.LG

    SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

    Authors: Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta

    Abstract: Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  3. arXiv:2402.11401  [pdf, other

    cs.CV cs.LG

    GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constr… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  4. Static and Dynamic Synthesis of Bengali and Devanagari Signatures

    Authors: Miguel A. Ferrer, Sukalpa Chanda, Moises Diaz, Chayan Kr. Banerjee, Anirban Majumdar, Cristina Carmona-Duarte, Parikshit Acharya, Umapada Pal

    Abstract: Develo** an automatic signature verification system is challenging and demands a large number of training samples. This is why synthetic handwriting generation is an emerging topic in document image analysis. Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement. Specifically, a mot… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted version. Published on IEEE Transactions on Cybernetics [ISSN 2168-2267], v. 48(10), p. 2896-2907

    Journal ref: IEEE Transactions on Cybernetics, v. 48(10), p. 2896-2907, 2018

  5. arXiv:2312.03946  [pdf, other

    cs.CV

    A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

    Authors: Risab Biswas, Swalpa Kumar Roy, Umapada Pal

    Abstract: Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose \textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder architecture based on a To… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2312.03568

  6. arXiv:2312.03568  [pdf, other

    cs.CV

    DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

    Authors: Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang

    Abstract: In real life, various degradation scenarios exist that might damage document images, making it harder to recognize and analyze them, thus binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer (Document Binarization Transformer), a novel two-level vision transformer (TL-ViT) architecture based on vision trans… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  7. arXiv:2310.00917  [pdf, other

    cs.CV

    Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

    Authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya

    Abstract: The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  8. arXiv:2310.00558  [pdf, other

    cs.CV

    Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

    Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós

    Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of d… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024

  9. arXiv:2308.02905  [pdf, other

    cs.CV cs.MM

    FAST: Font-Agnostic Scene Text Editing

    Authors: Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

    Abstract: Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: 13 pages, in submission

  10. arXiv:2308.01140  [pdf, other

    cs.LG cs.CV

    Dynamically Scaled Temperature in Self-Supervised Contrastive Learning

    Authors: Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal

    Abstract: In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples of different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regu… ▽ More

    Submitted 10 May, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  11. SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed qu… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ICDAR 2023 (San Jose, California)

  12. arXiv:2305.04524  [pdf, other

    cs.CV

    Scene Text Recognition with Image-Text Matching-guided Dictionary

    Authors: Jiajun Wei, Hongjian Zhan, Xiao Tu, Yue Lu, Umapada Pal

    Abstract: Employing a dictionary can efficiently rectify the deviation between the visual prediction and the ground truth in scene text recognition methods. However, the independence of the dictionary on the visual features may lead to incorrect rectification of accurate visual predictions. In this paper, we propose a new dictionary language model leveraging the Scene Image-Text Matching(SITM) network, whic… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at ICDAR2023

  13. SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

    Authors: Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal

    Abstract: Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal… ▽ More

    Submitted 20 August, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

    Journal ref: ICDAR 2023 (International Conference on Document Analysis and Recognition) Lecture Notes in Computer Science, vol 14187, pp. 342-360. Springer Nature

  14. arXiv:2304.11993  [pdf, other

    cs.CV cs.MM

    MMC: Multi-Modal Colorization of Images using Textual Descriptions

    Authors: Subhankar Ghosh, Saumik Bhattacharya, Prasun Roy, Umapada Pal, Michael Blumenstein

    Abstract: Handling various objects with different colors is a significant challenge for image colorization techniques. Thus, for complex real-world scenes, the existing image colorization algorithms often fail to maintain color consistency. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of… ▽ More

    Submitted 25 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 9 pages

  15. arXiv:2304.04376  [pdf, other

    cs.CV

    ICDAR 2023 Video Text Reading Competition for Dense and Small Text

    Authors: Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

    Abstract: Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Journal ref: ICDAR 2023 competition

  16. arXiv:2303.07989  [pdf, other

    cs.CV cs.HC

    A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing

    Authors: Prasun Roy, Subhankar Ghosh, Umapada Pal

    Abstract: Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marke… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted in The International Conference on Frontiers of Handwriting Recognition (ICFHR) 2018

  17. arXiv:2302.14728  [pdf, other

    cs.CV cs.MM

    Global Context-Aware Person Image Generation

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

    Abstract: We propose a data-driven approach for context-aware person image generation. Specifically, we attempt to generate a person image such that the synthesized instance can blend into a complex scene. In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene. The proposed technique is divided into three sequential steps.… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 14 pages

  18. arXiv:2208.02843  [pdf, other

    cs.CV

    TIC: Text-Guided Image Colorization

    Authors: Subhankar Ghosh, Prasun Roy, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

  19. arXiv:2207.11718  [pdf, other

    cs.CV cs.MM

    TIPS: Text-Induced Pose Synthesis

    Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: Accepted in The European Conference on Computer Vision (ECCV) 2022

  20. arXiv:2207.10256  [pdf, other

    cs.CV

    SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

    Authors: Dajian Zhong, Shu**g Lyu, Palaiahnakote Shivakumara, Bing Yin, Jiajia Wu, Umapada Pal, Yue Lu

    Abstract: Scene text recognition is a challenging task due to the complex backgrounds and diverse variations of text instances. In this paper, we propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images. The proposed method first generates the simple semantic feature using Semantic GAN and then recognizes the scene text with the Balanced Attention Module.… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  21. arXiv:2206.02717  [pdf, other

    cs.CV cs.MM

    Scene Aware Person Image Generation through Global Contextual Conditioning

    Authors: Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

    Abstract: Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the pe… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: Accepted in The International Conference on Pattern Recognition (ICPR) 2022

  22. arXiv:2202.13078  [pdf, other

    cs.CV cs.LG eess.IV

    SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature Verification

    Authors: Siladittya Manna, Soumitri Chattopadhyay, Saumik Bhattacharya, Umapada Pal

    Abstract: Writer independent offline signature verification is one of the most challenging tasks in pattern recognition as there is often a scarcity of training data. To handle such data scarcity problem, in this paper, we propose a novel self-supervised learning (SSL) framework for writer independent offline signature verification. To our knowledge, this is the first attempt to utilize self-supervised sett… ▽ More

    Submitted 12 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted at IEEE ICIP 2022

  23. arXiv:2202.06777  [pdf, other

    cs.CV cs.MM

    Multi-scale Attention Guided Pose Transfer

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

    Abstract: Pose transfer refers to the probabilistic image generation of a person with a previously unseen novel pose from another image of that person having a different pose. Due to potential academic and commercial applications, this problem is extensively studied in recent years. Among the various approaches to the problem, attention guided progressive generation is shown to produce state-of-the-art resu… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Comments: 14 pages

  24. arXiv:2201.11438  [pdf, other

    cs.CV

    DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

    Authors: Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal

    Abstract: Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects (title, sections, figures etc.) has emerged as an interesting problem fo… ▽ More

    Submitted 21 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Preprint

  25. arXiv:2201.10252  [pdf, other

    cs.CV

    DocEnTr: An End-to-End Document Image Enhancement Transformer

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Fornés, Josep Lladós, Umapada Pal

    Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: submitted to ICPR 2022

  26. arXiv:2201.10138  [pdf, other

    cs.CV

    SURDS: Self-Supervised Attention-guided Reconstruction and Dual Triplet Loss for Writer Independent Offline Signature Verification

    Authors: Soumitri Chattopadhyay, Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: Offline Signature Verification (OSV) is a fundamental biometric task across various forensic, commercial and legal applications. The underlying task at hand is to carefully model fine-grained features of the signatures to distinguish between genuine and forged ones, which differ only in minute deformities. This makes OSV more challenging compared to other verification problems. In this work, we pr… ▽ More

    Submitted 26 June, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted at ICPR 2022

  27. arXiv:2111.12664  [pdf, other

    cs.CV stat.ML

    MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

    Authors: Siladittya Manna, Umapada Pal, Saumik Bhattacharya

    Abstract: Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect and predict whether a pair is positive or negative. We further improve the näive los… ▽ More

    Submitted 9 March, 2023; v1 submitted 24 November, 2021; originally announced November 2021.

  28. arXiv:2111.10618  [pdf, other

    eess.IV cs.CV

    PAANet: Progressive Alternating Attention for Automatic Medical Image Segmentation

    Authors: Abhishek Srivastava, Sukalpa Chanda, Debesh Jha, Michael A. Riegler, Pål Halvorsen, Dag Johansen, Umapada Pal

    Abstract: Medical image segmentation can provide detailed information for clinical analysis which can be useful for scenarios where the detailed location of a finding is important. Knowing the location of disease can play a vital role in treatment and decision-making. Convolutional neural network (CNN) based encoder-decoder techniques have advanced the performance of automated medical image segmentation sys… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

  29. arXiv:2111.10614  [pdf, other

    eess.IV cs.CV

    GMSRF-Net: An improved generalizability with global multi-scale residual fusion network for polyp segmentation

    Authors: Abhishek Srivastava, Sukalpa Chanda, Debesh Jha, Umapada Pal, Sharib Ali

    Abstract: Colonoscopy is a gold standard procedure but is highly operator-dependent. Efforts have been made to automate the detection and segmentation of polyps, a precancerous precursor, to effectively minimize missed rate. Widely used computer-aided polyp segmentation systems actuated by encoder-decoder have achieved high performance in terms of accuracy. However, polyp segmentation datasets collected fro… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

  30. arXiv:2111.10605  [pdf, other

    cs.CV

    Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction Techniques for Text-Independent Writer Identification

    Authors: Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

    Abstract: Text independent writer identification is a challenging problem that differentiates between different handwriting styles to decide the author of the handwritten text. Earlier writer identification relied on handcrafted features to reveal pieces of differences between writers. Recent work with the advent of convolutional neural network, deep learning-based methods have evolved. In this paper, three… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 14 pages, 4 figures

  31. arXiv:2111.10591  [pdf, other

    cs.CV

    AGA-GAN: Attribute Guided Attention Generative Adversarial Network with U-Net for Face Hallucination

    Authors: Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

    Abstract: The performance of facial super-resolution methods relies on their ability to recover facial structures and salient features effectively. Even though the convolutional neural network and generative adversarial network-based methods deliver impressive performances on face hallucination tasks, the ability to use attributes associated with the low-resolution images to improve performance is unsatisfa… ▽ More

    Submitted 20 November, 2021; originally announced November 2021.

    Comments: 27 pages, 9 Figures

  32. arXiv:2108.09335  [pdf, other

    cs.CV cs.LG

    LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning

    Authors: Bhavya Vasudeva, Puneesh Deora, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

    Abstract: Deep metric learning has been effectively used to learn distance metrics for different visual tasks like image retrieval, clustering, etc. In order to aid the training process, existing methods either use a hard mining strategy to extract the most informative samples or seek to generate hard synthetics using an additional network. Such approaches face different challenges and can lead to biased em… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: 17 pages, 9 figures, 5 tables. Accepted at The IEEE/CVF International Conference on Computer Vision (ICCV) 2021

  33. arXiv:2107.04357  [pdf, other

    cs.CV cs.LG

    Graph-based Deep Generative Modelling for Document Layout Generation

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR Workshops-GLESDO 2021

  34. arXiv:2107.02638  [pdf, other

    cs.CV

    DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

    Authors: Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

    Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: Accepted by ICDAR 2021

  35. arXiv:2105.09909  [pdf, other

    cs.CV cs.AI cs.NE

    PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection

    Authors: Dipayan Das, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

    Abstract: Reservoir Computing (RC) offers a viable option to deploy AI algorithms on low-end embedded system platforms. Liquid State Machine (LSM) is a bio-inspired RC model that mimics the cortical microcircuits and uses spiking neural networks (SNN) that can be directly realized on neuromorphic hardware. In this paper, we present a novel Parallelized LSM (PLSM) architecture that incorporates spatio-tempor… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

  36. arXiv:2105.07451  [pdf, other

    eess.IV cs.CV

    MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation

    Authors: Abhishek Srivastava, Debesh Jha, Sukalpa Chanda, Umapada Pal, Håvard D. Johansen, Dag Johansen, Michael A. Riegler, Sharib Ali, Pål Halvorsen

    Abstract: Methods based on convolutional neural networks have improved the performance of biomedical image segmentation. However, most of these methods cannot efficiently segment objects of variable sizes and train on small and biased datasets, which are common for biomedical use cases. While methods exist that incorporate multi-scale fusion approaches to address the challenges arising with variable sizes,… ▽ More

    Submitted 30 January, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2022

  37. arXiv:2104.10481  [pdf, other

    cs.CV cs.LG

    SKID: Self-Supervised Learning for Knee Injury Diagnosis from MRI Data

    Authors: Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: In medical image analysis, the cost of acquiring high-quality data and their annotation by experts is a barrier in many medical applications. Most of the techniques used are based on supervised learning framework and need a large amount of annotated data to achieve satisfactory performance. As an alternative, in this paper, we propose a self-supervised learning (SSL) approach to learn the spatial… ▽ More

    Submitted 19 October, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

  38. arXiv:2011.04994  [pdf, other

    cs.CV eess.IV

    AIM 2020 Challenge on Learned Image Signal Processing Pipeline

    Authors: Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, Wangmeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li , et al. (14 additional authors not shown)

    Abstract: This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB map** problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of com… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Published in ECCV 2020 Workshops (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

  39. arXiv:2010.12669  [pdf, other

    cs.CV cs.HC

    Position and Rotation Invariant Sign Language Recognition from 3D Kinect Data with Recurrent Neural Networks

    Authors: Prasun Roy, Saumik Bhattacharya, Partha Pratim Roy, Umapada Pal

    Abstract: Sign language is a gesture-based symbolic communication medium among speech and hearing impaired people. It also serves as a communication bridge between non-impaired and impaired populations. Unfortunately, in most situations, a non-impaired person is not well conversant in such symbolic languages restricting the natural information flow between these two categories. Therefore, an automated trans… ▽ More

    Submitted 14 March, 2023; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: 10 pages

  40. Self-Supervised Representation Learning for Detection of ACL Tear Injury in Knee MR Videos

    Authors: Siladittya Manna, Saumik Bhattacharya, Umapada Pal

    Abstract: The success of deep learning based models for computer vision applications requires large scale human annotated data which are often expensive to generate. Self-supervised learning, a subset of unsupervised learning, handles this problem by learning meaningful features from unlabeled image or video data. In this paper, we propose a self-supervised learning approach to learn transferable features f… ▽ More

    Submitted 14 December, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  41. arXiv:2007.07075  [pdf, other

    cs.CV

    UDBNET: Unsupervised Document Binarization Network via Adversarial Game

    Authors: Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

    Abstract: Degraded document image binarization is one of the most challenging tasks in the domain of document image analysis. In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game. We train the network in an unsupervised setup by assuming that we do not have any paired-training data. In our approach, an Adversarial Texture Augment… ▽ More

    Submitted 27 October, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

    Comments: Accepted in ICPR 2020

  42. arXiv:2005.12524  [pdf

    cs.CV cs.MM

    A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

    Authors: Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein

    Abstract: Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels i… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: Accepted in Pattern Recognition, Elsevier

  43. arXiv:2004.08141  [pdf, other

    cs.CV

    Modeling Extent-of-Texture Information for Ground Terrain Recognition

    Authors: Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

    Abstract: Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image. In this paper, we propose a novel approach towards ground-terrain recognition via modeling the Extent-of-Texture information to establish a balance between the order-less texture component and ordered-spatial information locally. At first, the proposed method u… ▽ More

    Submitted 27 October, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: Accepted in ICPR 2020

  44. arXiv:1910.01853  [pdf

    cs.CV cs.LG

    DELP-DAR System for License Plate Detection and Recognition

    Authors: Zied Selmi, Mohamed Ben Halima, Umapada Pal, M. Adel Alimi

    Abstract: Automatic License Plate detection and Recognition (ALPR) is a quite popular and active research topic in the field of computer vision, image processing and intelligent transport systems. ALPR is used to make detection and recognition processes more robust and efficient in highly complicated environments and backgrounds. Several research investigations are still necessary due to some constraints su… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  45. arXiv:1907.00945  [pdf, ps, other

    cs.CV

    ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019

    Authors: Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-lin Liu, Jean-Marc Ogier

    Abstract: With the growing cosmopolitan culture of modern cities, the need of robust Multi-Lingual scene Text (MLT) detection and recognition systems has never been more immense. With the goal to systematically benchmark and push the state-of-the-art forward, the proposed competition builds on top of the RRC-MLT-2017 with an additional end-to-end task, an additional language in the real images dataset, a la… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: ICDAR'19 camera-ready version. Competition available at https://rrc.cvc.uab.es/?ch=15. The first two authors contributed equally

  46. arXiv:1905.01168  [pdf, other

    cs.CV

    Distance Metric Learned Collaborative Representation Classifier

    Authors: Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

    Abstract: Any generic deep machine learning algorithm is essentially a function fitting exercise, where the network tunes its weights and parameters to learn discriminatory features by minimizing some cost function. Though the network tries to learn the optimal feature space, it seldom tries to learn an optimal distance metric in the cost function, and hence misses out on an additional layer of abstraction.… ▽ More

    Submitted 30 September, 2021; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: text overlap with arXiv:1903.09123

  47. arXiv:1903.09123  [pdf, other

    cs.CV

    PProCRC: Probabilistic Collaboration of Image Patches

    Authors: Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

    Abstract: We present a conditional probabilistic framework for collaborative representation of image patches. It incorporates background compensation and outlier patch suppression into the main formulation itself, thus doing away with the need for pre-processing steps to handle the same. A closed form non-iterative solution of the cost function is derived. The proposed method (PProCRC) outperforms earlier C… ▽ More

    Submitted 9 November, 2020; v1 submitted 21 March, 2019; originally announced March 2019.

  48. arXiv:1903.01192  [pdf, other

    cs.CV cs.MM

    STEFANN: Scene Text Editor using Font Adaptive Neural Network

    Authors: Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

    Abstract: Textual information in a captured scene plays an important role in scene interpretation and decision making. Though there exist methods that can successfully detect and interpret complex text regions present in a scene, to the best of our knowledge, there is no significant prior work that aims to modify the textual information in an image. The ability to edit text directly on images has several ad… ▽ More

    Submitted 29 March, 2023; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: Accepted in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020

  49. arXiv:1901.09886  [pdf, other

    cs.CV

    CoCoNet: A Collaborative Convolutional Network

    Authors: Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

    Abstract: We present an end-to-end deep network for fine-grained visual categorization called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative layer after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nat… ▽ More

    Submitted 9 November, 2020; v1 submitted 28 January, 2019; originally announced January 2019.

  50. A Deep One-Shot Network for Query-based Logo Retrieval

    Authors: Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal

    Abstract: Logo detection in real-world scene images is an important problem with applications in advertisement and marketing. Existing general-purpose object detection methods require large training data with annotations for every logo class. These methods do not satisfy the incremental demand of logo classes necessary for practical deployment since it is practically impossible to have such annotated data f… ▽ More

    Submitted 13 July, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

    Comments: Accepted in Pattern Recognition, Elsevier(2019)