Skip to main content

Showing 1–18 of 18 results for author: G, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17449  [pdf

    cs.CV cs.AI cs.LG

    Image Based Character Recognition, Documentation System To Decode Inscription From Temple

    Authors: Velmathi G, Shangavelan M, Harish D, Krithikshun M S

    Abstract: This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract ai… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: This research paper is a part of capstone project submitted to VIT Chennai, VIT University

  2. arXiv:2403.06350  [pdf, other

    cs.CL

    IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

    Authors: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

    Abstract: Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  3. arXiv:2401.00125  [pdf, other

    cs.AI cs.CV

    LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

    Authors: S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker

    Abstract: Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that requir… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 15 pages, 8 figures, 7 tables

  4. arXiv:2401.00094  [pdf, other

    cs.CV

    Generating Enhanced Negatives for Training Language-Based Object Detectors

    Authors: Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

    Abstract: The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: Accepted to CVPR 2024. The supplementary document included

  5. arXiv:2311.01295  [pdf, ps, other

    cs.LG cs.CR cs.CV

    DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

    Authors: Wenxuan Bao, Francesco Pittaluga, Vijay Kumar B G, Vincent Bindschaedler

    Abstract: Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the l… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 17 pages, 2 figures, to be published in Neural Information Processing Systems 2023

  6. arXiv:2308.06412  [pdf, other

    cs.CV

    Taming Self-Training for Open-Vocabulary Object Detection

    Authors: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

    Abstract: Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distr… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted to CVPR 2024. The supplementary document included

  7. arXiv:2304.11463  [pdf, other

    cs.CV

    OmniLabel: A Challenging Benchmark for Language-Based Object Detection

    Authors: Samuel Schulter, Vijay Kumar B G, Yumin Suh, Konstantinos M. Dafnis, Zhixing Zhang, Shiyu Zhao, Dimitris Metaxas

    Abstract: Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking. With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. The task subsumes standard- and open-vocabulary detection a… ▽ More

    Submitted 14 August, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Oral - Visit our project website at https://www.omnilabel.org

  8. arXiv:2304.10256  [pdf

    cs.CV cs.CL cs.LG

    Indian Sign Language Recognition Using Mediapipe Holistic

    Authors: Dr. Velmathi G, Kaushal Goyal

    Abstract: Deaf individuals confront significant communication obstacles on a daily basis. Their inability to hear makes it difficult for them to communicate with those who do not understand sign language. Moreover, it presents difficulties in educational, occupational, and social contexts. By providing alternative communication channels, technology can play a crucial role in overcoming these obstacles. One… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 16 pages, 22 figures

  9. arXiv:2207.08954  [pdf, other

    cs.CV

    Exploiting Unlabeled Data with Vision and Language Models for Object Detection

    Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

    Abstract: Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectiv… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022 (with the supplementary document)

  10. arXiv:2112.07170  [pdf, other

    cs.NI

    Performance evaluation of the QOS provisioning ability of IEEE 802.11e WLAN standard for multimedia traffic

    Authors: Venkata Sitaram. A, Venkatesh. T. G, Arun George, Manivasakan. R, Bhasker Dappuri

    Abstract: This paper presents an analytical model for the average frame transmission delay and the jitter for the different Access Categories (ACs) of the IEEE 802.11e Enhanced Distributed Channel Access (EDCA) mechanism. Following are the salient features of our model. As defined by the standard we consider (1) the virtual collisions among different ACs inside each EDCA station in addition to external coll… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  11. arXiv:2110.08510  [pdf, ps, other

    cs.LG cs.IR cs.SI

    DFW-PP: Dynamic Feature Weighting based Popularity Prediction for Social Media Content

    Authors: Viswanatha Reddy G, Chaitanya B S N V, Prathyush P, Sumanth M, Mrinalini C, Dileep Kumar P, Snehasis Mukherjee

    Abstract: The increasing popularity of social media platforms makes it important to study user engagement, which is a crucial aspect of any marketing strategy or business model. The over-saturation of content on social media platforms has persuaded us to identify the important factors that affect content popularity. This comes from the fact that only an iota of the humongous content available online receive… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  12. arXiv:2109.02762  [pdf, other

    cs.CV

    STRIVE: Scene Text Replacement In Videos

    Authors: Vijay Kumar B G, Jeyasri Subramanian, Varnith Chordia, Eugene Bart, Shaobo Fang, Kelly Guan, Raja Bala

    Abstract: We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021, Project Page: https://striveiccv2021.github.io/STRIVE-ICCV2021/

  13. arXiv:1909.06734  [pdf

    cs.SE

    A brief TOGAF description using SEMAT Essence Kernel

    Authors: David C. Múnera, Fernán A. Villa G

    Abstract: This work aims to explore the possibility of describing the enterprise architecture framework TOGAF using the Essence kernel SEMAT, see if they fit together, and if such marriage brings into lights any weaknesses of the models.

    Submitted 15 September, 2019; originally announced September 2019.

  14. arXiv:1806.00911  [pdf, other

    cs.CV

    Bayesian Semantic Instance Segmentation in Open Set World

    Authors: Trung Pham, Vijay Kumar B G, Thanh-Toan Do, Gustavo Carneiro, Ian Reid

    Abstract: This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase… ▽ More

    Submitted 29 July, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

    Comments: Accepted to ECCV 2018

  15. arXiv:1704.01285  [pdf, other

    cs.CV

    Smart Mining for Deep Metric Learning

    Authors: Ben Harwood, Vijay Kumar B G, Gustavo Carneiro, Ian Reid, Tom Drummond

    Abstract: To solve deep metric learning problems and producing feature embeddings, current methodologies will commonly use a triplet model to minimise the relative distance between samples from the same class and maximise the relative distance between samples from different classes. Though successful, the training convergence of this triplet model can be compromised by the fact that the vast majority of the… ▽ More

    Submitted 27 July, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

    Comments: *Vijay Kumar B G and Ben Harwood contributed equally to this work. Accepted in IEEE International Conference on Computer Vision, ICCV 2017

  16. arXiv:1611.08998  [pdf, other

    cs.CV cs.AI cs.LG

    DeepSetNet: Predicting Sets with Deep Neural Networks

    Authors: S. Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid

    Abstract: This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for… ▽ More

    Submitted 10 August, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

    Comments: Accepted in IEEE International Conference on Computer Vision (ICCV), Venice, 2017, (Spotlight)

  17. arXiv:1512.09272  [pdf, other

    cs.CV

    Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

    Authors: Vijay Kumar B G, Gustavo Carneiro, Ian Reid

    Abstract: Recent innovations in training deep convolutional neural network (ConvNet) models have motivated the design of new methods to automatically learn local image descriptors. The latest deep ConvNets proposed for this task consist of a siamese network that is trained by penalising misclassification of pairs of local image patches. Current results from machine learning show that replacing this siamese… ▽ More

    Submitted 1 August, 2016; v1 submitted 31 December, 2015; originally announced December 2015.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR 2016)

  18. arXiv:1506.01398  [pdf

    cs.CV

    Recognition of Changes in SAR Images Based on Gauss-Log Ratio and MRFFCM

    Authors: Jismy Alphonse, Biju V. G.

    Abstract: A modified version of MRFFCM (Markov Random Field Fuzzy C means) based SAR (Synthetic aperture Radar) image change detection method is proposed in this paper. It involves three steps: Difference Image (DI) generation by using Gauss-log ratio operator, speckle noise reduction by SRAD (Speckle Reducing Anisotropic Diffusion), and the detection of changed regions by using MRFFCM. The proposed method… ▽ More

    Submitted 3 June, 2015; originally announced June 2015.

    Comments: 7 pages, 7 figures, 2 tables in International Journal of advanced studies in Computer Science and Engineering (IJASCSE), ISSN : 2278 7917, Volume 4 Issue 5, 2015, www.ijascse.org

    Report number: page 65-71