Skip to main content

Showing 1–50 of 66 results for author: Bhunia, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01810  [pdf, other

    cs.CV

    Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

    Authors: Aneeshan Sain, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

    Abstract: In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employ… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in European Conference on Computer Vision (ECCV) 2024

  2. arXiv:2406.20099  [pdf, other

    cs.CV

    Odd-One-Out: Anomaly Detection by Comparing with Neighbors

    Authors: Ankan Bhunia, Changjian Li, Hakan Bilen

    Abstract: This paper introduces a novel anomaly detection (AD) problem that focuses on identifying `odd-looking' objects relative to the other instances within a scene. Unlike the traditional AD benchmarks, in our setting, anomalies in this context are scene-specific, defined by the regular instances that make up the majority. Since object instances are often partly visible from a single viewpoint, our sett… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Codes & Dataset at https://github.com/VICO-UoE/OddOneOutAD

  3. arXiv:2406.19393  [pdf, other

    cs.CV

    Looking 3D: Anomaly Detection with 2D-3D Alignment

    Authors: Ankan Bhunia, Changjian Li, Hakan Bilen

    Abstract: Automatic anomaly detection based on visual cues holds practical significance in various domains, such as manufacturing and product quality assessment. This paper introduces a new conditional anomaly detection problem, which involves identifying anomalies in a query image by comparing it to a reference shape. To address this challenge, we have created a large dataset, BrokenChairs-180K, consisting… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPR'24. Codes & dataset available at https://github.com/VICO-UoE/Looking3D

  4. arXiv:2405.18716  [pdf, other

    cs.CV

    SketchDeco: Decorating B&W Sketches with Colour

    Authors: Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

    Abstract: This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the l… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2403.09480  [pdf, other

    cs.CV cs.AI

    What Sketch Explainability Really Means for Downstream Tasks

    Authors: Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Tao Xiang, Yi-Zhe Song

    Abstract: In this paper, we explore the unique modality of sketch for explainability, emphasising the profound impact of human strokes compared to conventional pixel-oriented studies. Beyond explanations of network behavior, we discern the genuine implications of explainability across diverse downstream sketch-related tasks. We propose a lightweight and portable explainability solution -- a seamless plugin… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  6. arXiv:2403.09344  [pdf, other

    cs.CV cs.AI

    SketchINR: A First Look into Sketches as Implicit Neural Representations

    Authors: Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song

    Abstract: We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the $xy$ point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperf… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  7. arXiv:2403.07234  [pdf, other

    cs.CV

    It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

    Authors: Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/StableSketching

  8. arXiv:2403.07222  [pdf, other

    cs.CV

    You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

    Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously ex… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/Sketch2Word

  9. arXiv:2403.07214  [pdf, other

    cs.CV

    Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

    Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pi… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/DiffusionZSSBIR

  10. arXiv:2403.07203  [pdf, other

    cs.CV

    How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?

    Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: In this paper, we propose a novel abstraction-aware sketch-based image retrieval framework capable of handling sketch abstraction at varied levels. Prior works had mainly focused on tackling sub-factors such as drawing style and order, we instead attempt to model abstraction as a whole, and propose feature-level and retrieval granularity-level designs so that the system builds into its DNA the nec… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/AbstractAway

  11. arXiv:2312.04364  [pdf, other

    cs.CV

    DemoCaricature: Democratising Caricature Generation with a Rough Sketch

    Authors: Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song

    Abstract: In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to strike a delicate balance between abstraction and identity, while preserving the creativity and subjectivity inherent in a sketch. To achieve this, we present Explicit Rank-1 Model Editing alongside single-image pe… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  12. arXiv:2312.04043  [pdf, other

    cs.CV cs.AI

    Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes

    Authors: Hmrishav Bandyopadhyay, Subhadeep Koley, Ayan Das, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: In this paper, we democratise 3D content creation, enabling precise generation of 3D shapes from abstract sketches while overcoming limitations tied to drawing skills. We introduce a novel part-level modelling and alignment framework that facilitates abstraction modelling and cross-modal correspondence. Leveraging the same part-level decoder, our approach seamlessly extends to sketch modelling by… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: CVPR 2024, Project Page: https://hmrishavbandy.github.io/doodle23d/

  13. arXiv:2304.01992  [pdf, other

    eess.IV cs.CV

    Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

    Authors: Amandeep Kumar, Ankan kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

    Abstract: In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues. Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images. Within our XM-GAN, a novel controllable fusion block densely aggregates local r… ▽ More

    Submitted 4 July, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Early Accept in MICCAI 2023

  14. arXiv:2304.01172  [pdf, other

    cs.CV

    Generative Multiplane Neural Radiance for 3D-Aware Image Generation

    Authors: Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

    Abstract: We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views. The proposed multiplane neural radiance model, named GMNR, consists of a novel α-guided view-dependent representation (α-VdR) module for learning view-dependent information. The α-VdR module, faciliated by an α-guided pixel sampling technique, computes the view-depende… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Technical report

  15. arXiv:2303.15149  [pdf, other

    cs.CV

    What Can Human Sketches Do for Object Detection?

    Authors: Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song

    Abstract: Sketches are highly expressive, inherently capturing subjective and fine-grained visual cues. The exploration of such innate properties of human sketches has, however, been limited to that of image retrieval. In this paper, for the first time, we cultivate the expressiveness of sketches but for the fundamental vision task of object detection. The end result is a sketch-enabled object detection fra… ▽ More

    Submitted 28 October, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Best Paper Finalist (Top 12 Best Papers). Presented in special single-track plenary sessions to all attendees in Computer Vision and Pattern Recognition (CVPR), 2023. Updated an error in Fig.3 (from Softmax to Cross Entropy). Thanks to the community for pointing it out

  16. arXiv:2303.13779  [pdf, other

    cs.CV

    Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

    Authors: Aneeshan Sain, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Soumitri Chattopadhyay, Tao Xiang, Yi-Zhe Song

    Abstract: This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2023. Project page available at https://aneeshan95.github.io/Sketch_PVT/

  17. arXiv:2303.13440  [pdf, other

    cs.CV

    CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

    Authors: Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep Koley, Tao Xiang, Yi-Zhe Song

    Abstract: In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained set… ▽ More

    Submitted 27 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2023. Project page available at https://aneeshan95.github.io/Sketch_LVM/

  18. arXiv:2303.11502  [pdf, other

    cs.CV

    Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

    Authors: Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: Human sketch has already proved its worth in various visual understanding tasks (e.g., retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of sketches - that they are also salient. This is intuitive as sketching is a natural attentive process at its core. More specifically, we aim to study how sketches can be used as a weak label to detect salient objects present… ▽ More

    Submitted 30 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page available at https://ayankumarbhunia.github.io/Sketch2Saliency/

  19. arXiv:2303.11162  [pdf, other

    cs.CV

    Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

    Authors: Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: Given an abstract, deformed, ordinary sketch from untrained amateurs like you and me, this paper turns it into a photorealistic image - just like those shown in Fig. 1(a), all non-cherry-picked. We differ significantly from prior art in that we do not dictate an edgemap-like sketch to start with, but aim to work with abstract free-hand human sketches. In doing so, we essentially democratise the sk… ▽ More

    Submitted 30 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2023. Project page available at https://subhadeepkoley.github.io/PictureThatSketch

  20. arXiv:2303.07775  [pdf, other

    cs.CV

    Data-Free Sketch-Based Image Retrieval

    Authors: Abhra Chaudhuri, Ayan Kumar Bhunia, Yi-Zhe Song, Anjan Dutta

    Abstract: Rising concerns about privacy and anonymity preservation of deep learning models have facilitated research in data-free learning (DFL). For the first time, we identify that for data-scarce tasks like Sketch-Based Image Retrieval (SBIR), where the difficulty in acquiring paired photos and hand-drawn sketches limits data-dependent cross-modal learning algorithms, DFL can prove to be a much more prac… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Computer Vision and Pattern Recognition (CVPR) 2023

  21. arXiv:2211.12500  [pdf, other

    cs.CV

    Person Image Synthesis via Denoising Diffusion Model

    Authors: Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

    Abstract: The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for… ▽ More

    Submitted 28 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to CVPR 2023

  22. arXiv:2210.15146  [pdf, other

    cs.CV

    Towards Practicality of Sketch-Based Visual Understanding

    Authors: Ayan Kumar Bhunia

    Abstract: Sketches have been used to conceptualise and depict visual objects from pre-historic times. Sketch research has flourished in the past decade, particularly with the proliferation of touchscreen devices. Much of the utilisation of sketch has been anchored around the fact that it can be used to delineate visual concepts universally irrespective of age, race, language, or demography. The fine-grained… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: PhD thesis successfully defended by Ayan Kumar Bhunia, Supervisor: Prof. Yi-Zhe Song, Thesis Examiners: Prof Stella Yu and Prof Adrian Hilton

  23. arXiv:2207.01723  [pdf, other

    cs.CV

    Adaptive Fine-Grained Sketch-Based Image Retrieval

    Authors: Ayan Kumar Bhunia, Aneeshan Sain, Parth Shah, Animesh Gupta, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: The recent focus on Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) has shifted towards generalising a model to new categories without any training data from them. In real-world applications, however, a trained FG-SBIR model is often applied to both new categories and different human sketchers, i.e., different drawing styles. Although this complicates the generalisation problem, fortunately, a… ▽ More

    Submitted 19 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted in ECCV 2022. Minor typos and Eq.4 corrected

  24. arXiv:2204.11964  [pdf, other

    cs.CV

    SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text

    Authors: Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Yi-Zhe Song

    Abstract: In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary modalities -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings.… ▽ More

    Submitted 26 March, 2023; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted in Computer Vision and Pattern Recognition (CVPR), 2023

  25. arXiv:2203.14843  [pdf, other

    cs.CV

    Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

    Authors: Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, Yi-Zhe Song

    Abstract: The human visual system is remarkable in learning new visual concepts from just a few examples. This is precisely the goal behind few-shot class incremental learning (FSCIL), where the emphasis is additionally placed on ensuring the model does not suffer from "forgetting". In this paper, we push the boundary further for FSCIL by addressing two key questions that bottleneck its ubiquitous applicati… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 10 pages, 3 figures. Accepted in CVPR 2022

  26. arXiv:2203.14817  [pdf, other

    cs.CV

    Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

    Authors: Ayan Kumar Bhunia, Subhadeep Koley, Abdullah Faiz Ur Rahman Khilji, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: Sketching enables many exciting applications, notably, image retrieval. The fear-to-sketch problem (i.e., "I can't sketch") has however proven to be fatal for its widespread adoption. This paper tackles this "fear" head on, and for the first time, proposes an auxiliary module for existing retrieval models that predominantly lets the users sketch without having to worry. We first conducted a pilot… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022 Code: https://github.com/AyanKumarBhunia/Stroke_Subset_Selector-for-FGSBIR

  27. arXiv:2203.14804  [pdf, other

    cs.CV

    Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

    Authors: Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Aneeshan Sain, Tao Xiang, Yi-Zhe Song

    Abstract: We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstrac… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted in CVPR 2022

  28. arXiv:2203.14691  [pdf, other

    cs.CV

    Sketch3T: Test-Time Training for Zero-Shot SBIR

    Authors: Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i.e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 10 pages, 5 figures. Accepted in CVPR 2022

  29. arXiv:2203.02113  [pdf, other

    cs.CV

    FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

    Authors: Pinaki Nath Chowdhury, Aneeshan Sain, Ayan Kumar Bhunia, Tao Xiang, Yulia Gryaditskaya, Yi-Zhe Song

    Abstract: We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offeri… ▽ More

    Submitted 20 July, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted in ECCV 2022. Project Page: https://fscoco.github.io

  30. arXiv:2112.03258  [pdf, other

    cs.CV cs.GR

    DoodleFormer: Creative Sketch Drawing with Transformers

    Authors: Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

    Abstract: Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage fra… ▽ More

    Submitted 15 September, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted to ECCV-2022. Project webpage: https://ankanbhunia.github.io/doodleformer/

  31. arXiv:2107.12090  [pdf, other

    cs.CV

    Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

    Authors: Ayan Kumar Bhunia, Aneeshan Sain, Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Yi-Zhe Song

    Abstract: Although text recognition has significantly evolved over the years, state-of-the-art (SOTA) models still struggle in the wild scenarios due to complex backgrounds, varying fonts, uncontrolled illuminations, distortions and other artefacts. This is because such models solely depend on visual information for text recognition, thus lacking semantic reasoning capabilities. In this paper, we argue that… ▽ More

    Submitted 26 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: IEEE International Conference on Computer Vision (ICCV), 2021

  32. arXiv:2107.12087  [pdf, other

    cs.CV

    Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

    Authors: Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song

    Abstract: Text recognition remains a fundamental and extensively researched topic in computer vision, largely owing to its wide array of commercial applications. The challenging nature of the very problem however dictated a fragmentation of research efforts: Scene Text Recognition (STR) that deals with text in everyday scenes, and Handwriting Text Recognition (HTR) that tackles hand-written text. In this pa… ▽ More

    Submitted 27 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: IEEE International Conference on Computer Vision (ICCV), 2021

  33. arXiv:2107.12081  [pdf, other

    cs.CV

    Towards the Unseen: Iterative Text Recognition by Distilling from Errors

    Authors: Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song

    Abstract: Visual text recognition is undoubtedly one of the most extensively researched topics in computer vision. Great progress have been made to date, with the latest models starting to focus on the more practical "in-the-wild" setting. However, a salient problem still hinders practical deployment -- prior arts mostly struggle with recognising unseen (or rarely seen) character sequences. In this paper, w… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: IEEE International Conference on Computer Vision (ICCV), 2021

  34. arXiv:2104.03964  [pdf, other

    cs.CV

    Handwriting Transformers

    Authors: Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak Shah

    Abstract: We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the propos… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Journal ref: ICCV 2021

  35. arXiv:2104.01876  [pdf, other

    cs.CV

    MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition

    Authors: Ayan Kumar Bhunia, Shuvozit Ghose, Amandeep Kumar, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song

    Abstract: Handwritten Text Recognition (HTR) remains a challenging problem to date, largely due to the varying writing styles that exist amongst us. Prior works however generally operate with the assumption that there is a limited number of styles, most of which have already been captured by existing datasets. In this paper, we take a completely different perspective -- we work on the assumption that there… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021

  36. arXiv:2103.15706  [pdf, other

    cs.CV

    StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

    Authors: Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song

    Abstract: Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An eff… ▽ More

    Submitted 31 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021

  37. arXiv:2103.13990  [pdf, other

    cs.CV

    More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

    Authors: Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, Yi-Zhe Song

    Abstract: A fundamental challenge faced by existing Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) models is the data scarcity -- model performances are largely bottlenecked by the lack of sketch-photo pairs. Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced. In this paper, we aim to mitigate such an upper-bound on sketch data, and study… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021 Code : https://github.com/AyanKumarBhunia/semisupervised-FGSBIR

  38. arXiv:2103.13716  [pdf, other

    cs.CV

    Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

    Authors: Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song

    Abstract: Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However supervision-free pre-text tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or tem… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021 Code : https://github.com/AyanKumarBhunia/Self-Supervised-Learning-for-Sketch

  39. arXiv:2007.15103  [pdf, other

    cs.CV cs.IR

    Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

    Authors: Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, Yi-Zhe Song

    Abstract: Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of… ▽ More

    Submitted 11 August, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: Accepted for ORAL presentation in BMVC 2020

  40. arXiv:2003.03836  [pdf, other

    cs.CV

    Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

    Authors: Ruoyi Du, Dongliang Chang, Ayan Kumar Bhunia, Jiyang Xie, Zhanyu Ma, Yi-Zhe Song, Jun Guo

    Abstract: Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most… ▽ More

    Submitted 19 July, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

  41. arXiv:2002.10310  [pdf, other

    cs.CV

    Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

    Authors: Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song

    Abstract: Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, wi… ▽ More

    Submitted 11 May, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020 [Oral Presentation] Code: https://github.com/AyanKumarBhunia/on-the-fly-FGSBIR

  42. The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification

    Authors: Dongliang Chang, Yifeng Ding, Jiyang Xie, Ayan Kumar Bhunia, Xiaoxu Li, Zhanyu Ma, Ming Wu, Jun Guo, Yi-Zhe Song

    Abstract: Key for solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show it is possible to cultivate subtle details without the need for overly complicated network designs or training m… ▽ More

    Submitted 10 August, 2021; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: TIP2020. Code available at https://github.com/dongliangchang/Mutual-Channel-Loss

  43. Facial Micro-Expression Spotting and Recognition using Time Contrasted Feature with Visual Memory

    Authors: Sauradip Nag, Ayan Kumar Bhunia, Aishik Konwer, Partha Pratim Roy

    Abstract: Facial micro-expressions are sudden involuntary minute muscle movements which reveal true emotions that people try to conceal. Spotting a micro-expression and recognizing it is a major challenge owing to its short duration and intensity. Many works pursued traditional and deep learning based approaches to solve this issue but compromised on learning low-level features and higher accuracy due to un… ▽ More

    Submitted 18 April, 2019; v1 submitted 9 February, 2019; originally announced February 2019.

    Comments: International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2019

  44. Texture Synthesis Guided Deep Hashing for Texture Image Retrieval

    Authors: Ayan Kumar Bhunia, Perla Sai Raj Kishore, Pranay Mukherjee, Abhirup Das, Partha Pratim Roy

    Abstract: With the large-scale explosion of images and videos over the internet, efficient hashing methods have been developed to facilitate memory and time efficient retrieval of similar images. However, none of the existing works uses hashing to address texture image retrieval mostly because of the lack of sufficiently large texture image databases. Our work addresses this problem by develo** a novel de… ▽ More

    Submitted 5 June, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

    Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 Video Presentation: https://www.youtube.com/watch?v=tXaXTGhzaJo

  45. arXiv:1811.01396  [pdf, other

    cs.CV

    Handwriting Recognition in Low-resource Scripts using Adversarial Learning

    Authors: Ayan Kumar Bhunia, Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy

    Abstract: Handwritten Word Recognition and Spotting is a challenging field dealing with handwritten text possessing irregular and complex shapes. The design of deep neural network models makes it necessary to extend training datasets in order to introduce variations and increase the number of samples; word-retrieval is therefore very difficult in low-resource scripts. Much of the existing literature compris… ▽ More

    Submitted 25 February, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

    Comments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

  46. A Deep One-Shot Network for Query-based Logo Retrieval

    Authors: Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal

    Abstract: Logo detection in real-world scene images is an important problem with applications in advertisement and marketing. Existing general-purpose object detection methods require large training data with annotations for every logo class. These methods do not satisfy the incremental demand of logo classes necessary for practical deployment since it is practically impossible to have such annotated data f… ▽ More

    Submitted 13 July, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

    Comments: Accepted in Pattern Recognition, Elsevier(2019)

  47. arXiv:1811.00201  [pdf, other

    cs.CV

    Cogni-Net: Cognitive Feature Learning through Deep Visual Perception

    Authors: Pranay Mukherjee, Abhirup Das, Ayan Kumar Bhunia, Partha Pratim Roy

    Abstract: Can we ask computers to recognize what we see from brain signals alone? Our paper seeks to utilize the knowledge learnt in the visual domain by popular pre-trained vision models and use it to teach a recurrent model being trained on brain signals to learn a discriminative manifold of the human brain's cognition of different visual object categories in response to perceived visual cues. For this we… ▽ More

    Submitted 1 May, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: IEEE International Conference on Image Processing (ICIP), 2019

  48. User Constrained Thumbnail Generation using Adaptive Convolutions

    Authors: Perla Sai Raj Kishore, Ayan Kumar Bhunia, Shuvozit Ghose, Partha Pratim Roy

    Abstract: Thumbnails are widely used all over the world as a preview for digital images. In this work we propose a deep neural framework to generate thumbnails of any size and aspect ratio, even for unseen values during training, with high accuracy and precision. We use Global Context Aggregation (GCA) and a modified Region Proposal Network (RPN) with adaptive convolutions to generate thumbnails in real tim… ▽ More

    Submitted 18 April, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: International Conference on Acoustics, Speech, and Signal Processing(ICASSP), 2019

  49. arXiv:1810.11120  [pdf, other

    cs.CV

    Improving Document Binarization via Adversarial Noise-Texture Augmentation

    Authors: Ankan Kumar Bhunia, Ayan Kumar Bhunia, Aneeshan Sain, Partha Pratim Roy

    Abstract: Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple… ▽ More

    Submitted 1 May, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: IEEE International Conference on Image Processing (ICIP), 2019. The full source code of the proposed system is publicly available at https://github.com/ankanbhunia/AdverseBiNet

  50. arXiv:1802.08568  [pdf, other

    cs.CV

    Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

    Authors: Ayan Kumar Bhunia, Subham Mukherjee, Aneeshan Sain, Ankan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

    Abstract: In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities join… ▽ More

    Submitted 15 October, 2019; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Accepted in Information Fusion, Elsevier