Search | arXiv e-print repository

Image Based Character Recognition, Documentation System To Decode Inscription From Temple

Authors: Velmathi G, Shangavelan M, Harish D, Krithikshun M S

Abstract: This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract ai… ▽ More This project undertakes the training and analysis of optical character recognition OCR methods applied to 10th century ancient Tamil inscriptions discovered on the walls of the Brihadeeswarar Temple.The chosen OCR methods include Tesseract,a widely used OCR engine,using modern ICR techniques to pre process the raw data and a box editing software to finetune our model.The analysis with Tesseract aims to evaluate their effectiveness in accurately deciphering the nuances of the ancient Tamil characters.The performance of our model for the dataset are determined by their accuracy rate where the evaluated dataset divided into training set and testing set.By addressing the unique challenges posed by the script's historical context,this study seeks to contribute valuable insights to the broader field of OCR,facilitating improved preservation and interpretation of ancient inscriptions △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: This research paper is a part of capstone project submitted to VIT Chennai, VIT University

arXiv:2403.06350 [pdf, other]

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

Authors: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

Abstract: Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re… ▽ More Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines highly curated manually verified data, unverified yet valuable data, and synthetic data. We build a clean, open-source pipeline for curating pre-training data from diverse sources, including websites, PDFs, and videos, incorporating best practices for crawling, cleaning, flagging, and deduplication. For instruction-fine tuning, we amalgamate existing Indic datasets, translate/transliterate English datasets into Indian languages, and utilize LLaMa2 and Mixtral models to create conversations grounded in articles from Indian Wikipedia and Wikihow. Additionally, we address toxicity alignment by generating toxic prompts for multiple scenarios and then generate non-toxic responses by feeding these toxic prompts to an aligned LLaMa2 model. We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages. The data and other artifacts created as part of this work are released with permissive licenses. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2401.00125 [pdf, other]

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning

Authors: S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker

Abstract: Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that requir… ▽ More Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that require complex driving maneuvers. To address these limitations, we investigate the possibility of leveraging the common-sense reasoning capabilities of Large Language Models (LLMs) such as GPT4 and Llama2 to generate plans for self-driving vehicles. In particular, we develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Guided by commonsense reasoning abilities of LLMs, our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach. Through extensive evaluation on the nuPlan benchmark, we achieve state-of-the-art performance, outperforming all existing pure learning- and rule-based methods across most metrics. Our code will be available at https://llmassist.github.io. △ Less

Submitted 29 December, 2023; originally announced January 2024.

Comments: 15 pages, 8 figures, 7 tables

arXiv:2401.00094 [pdf, other]

Generating Enhanced Negatives for Training Language-Based Object Detectors

Authors: Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

Abstract: The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make… ▽ More The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make the space of negatives extremely large. Prior works randomly sample negatives or use rule-based techniques to build them. In contrast, we propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data. Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images. Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks. Code is available at \url{https://github.com/xiaofeng94/Gen-Enhanced-Negs}. △ Less

Submitted 12 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: Accepted to CVPR 2024. The supplementary document included

arXiv:2311.01295 [pdf, ps, other]

DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

Authors: Wenxuan Bao, Francesco Pittaluga, Vijay Kumar B G, Vincent Bindschaedler

Abstract: Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the l… ▽ More Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at https://github.com/wenxuan-Bao/DP-Mix. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 17 pages, 2 figures, to be published in Neural Information Processing Systems 2023

arXiv:2308.06412 [pdf, other]

Taming Self-Training for Open-Vocabulary Object Detection

Authors: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

Abstract: Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distr… ▽ More Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs. To address these challenges, we propose SAS-Det that tames self-training for OVD from two key perspectives. First, we present a split-and-fusion (SAF) head that splits a standard detection into an open-branch and a closed-branch. This design can reduce noisy supervision from pseudo boxes. Moreover, the two branches learn complementary knowledge from different training data, significantly enhancing performance when fused together. Second, in our view, unlike in closed-set tasks, the PL distributions in OVD are solely determined by the teacher model. We introduce a periodic update strategy to decrease the number of updates to the teacher, thereby decreasing the frequency of changes in PL distributions, which stabilizes the training process. Extensive experiments demonstrate SAS-Det is both efficient and effective. SAS-Det outperforms recent models of the same scale by a clear margin and achieves 37.4 AP50 and 29.1 APr on novel categories of the COCO and LVIS benchmarks, respectively. Code is available at \url{https://github.com/xiaofeng94/SAS-Det}. △ Less

Submitted 12 April, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: Accepted to CVPR 2024. The supplementary document included

arXiv:2304.11463 [pdf, other]

OmniLabel: A Challenging Benchmark for Language-Based Object Detection

Authors: Samuel Schulter, Vijay Kumar B G, Yumin Suh, Konstantinos M. Dafnis, Zhixing Zhang, Shiyu Zhao, Dimitris Metaxas

Abstract: Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking. With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. The task subsumes standard- and open-vocabulary detection a… ▽ More Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking. With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. The task subsumes standard- and open-vocabulary detection as well as referring expressions. With more than 28K unique object descriptions on over 25K images, OmniLabel provides a challenging benchmark with diverse and complex object descriptions in a naturally open-vocabulary setting. Moreover, a key differentiation to existing benchmarks is that our object descriptions can refer to one, multiple or even no object, hence, providing negative examples in free-form text. The proposed evaluation handles the large label space and judges performance via a modified average precision metric, which we validate by evaluating strong language-based baselines. OmniLabel indeed provides a challenging test bed for future research on language-based detection. △ Less

Submitted 14 August, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

Comments: ICCV 2023 Oral - Visit our project website at https://www.omnilabel.org

arXiv:2304.10256 [pdf]

Indian Sign Language Recognition Using Mediapipe Holistic

Authors: Dr. Velmathi G, Kaushal Goyal

Abstract: Deaf individuals confront significant communication obstacles on a daily basis. Their inability to hear makes it difficult for them to communicate with those who do not understand sign language. Moreover, it presents difficulties in educational, occupational, and social contexts. By providing alternative communication channels, technology can play a crucial role in overcoming these obstacles. One… ▽ More Deaf individuals confront significant communication obstacles on a daily basis. Their inability to hear makes it difficult for them to communicate with those who do not understand sign language. Moreover, it presents difficulties in educational, occupational, and social contexts. By providing alternative communication channels, technology can play a crucial role in overcoming these obstacles. One such technology that can facilitate communication between deaf and hearing individuals is sign language recognition. We will create a robust system for sign language recognition in order to convert Indian Sign Language to text or speech. We will evaluate the proposed system and compare CNN and LSTM models. Since there are both static and gesture sign languages, a robust model is required to distinguish between them. In this study, we discovered that a CNN model captures letters and characters for recognition of static sign language better than an LSTM model, but it outperforms CNN by monitoring hands, faces, and pose in gesture sign language phrases and sentences. The creation of a text-to-sign language paradigm is essential since it will enhance the sign language-dependent deaf and hard-of-hearing population's communication skills. Even though the sign-to-text translation is just one side of communication, not all deaf or hard-of-hearing people are proficient in reading or writing text. Some may have difficulty comprehending written language due to educational or literacy issues. Therefore, a text-to-sign language paradigm would allow them to comprehend text-based information and participate in a variety of social, educational, and professional settings. Keywords: deaf and hard-of-hearing, DHH, Indian sign language, CNN, LSTM, static and gesture sign languages, text-to-sign language model, MediaPipe Holistic, sign language recognition, SLR, SLT △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 16 pages, 22 figures

arXiv:2207.08954 [pdf, other]

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

Abstract: Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectiv… ▽ More Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. Starting with a generic and class-agnostic region proposal mechanism, we use vision and language models to categorize each region of an image into any object category that is required for downstream tasks. We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection, where a model needs to generalize to unseen object categories, and semi-supervised object detection, where additional unlabeled images can be used to improve the model. Our empirical evaluation shows the effectiveness of the pseudo labels in both tasks, where we outperform competitive baselines and achieve a novel state-of-the-art for open-vocabulary object detection. Our code is available at https://github.com/xiaofeng94/VL-PLM. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV 2022 (with the supplementary document)

arXiv:2112.07170 [pdf, other]

Performance evaluation of the QOS provisioning ability of IEEE 802.11e WLAN standard for multimedia traffic

Authors: Venkata Sitaram. A, Venkatesh. T. G, Arun George, Manivasakan. R, Bhasker Dappuri

Abstract: This paper presents an analytical model for the average frame transmission delay and the jitter for the different Access Categories (ACs) of the IEEE 802.11e Enhanced Distributed Channel Access (EDCA) mechanism. Following are the salient features of our model. As defined by the standard we consider (1) the virtual collisions among different ACs inside each EDCA station in addition to external coll… ▽ More This paper presents an analytical model for the average frame transmission delay and the jitter for the different Access Categories (ACs) of the IEEE 802.11e Enhanced Distributed Channel Access (EDCA) mechanism. Following are the salient features of our model. As defined by the standard we consider (1) the virtual collisions among different ACs inside each EDCA station in addition to external collisions. (2) the effect of priority parameters, such as minimum and maximum values of Contention Window (CW) sizes, Arbitration Inter Frame Space (AIFS). (3) the role of Transmission Opportunity (TXOP) of different ACs. (4) the finite number of retrials a packet experiences before being dropped. Our model and analytical results provide an in-depth understanding of the EDCA mechanism and the effect of Quality of Service (QoS) parameters in the performance of IEEE 802.11e protocol. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2110.08510 [pdf, ps, other]

DFW-PP: Dynamic Feature Weighting based Popularity Prediction for Social Media Content

Authors: Viswanatha Reddy G, Chaitanya B S N V, Prathyush P, Sumanth M, Mrinalini C, Dileep Kumar P, Snehasis Mukherjee

Abstract: The increasing popularity of social media platforms makes it important to study user engagement, which is a crucial aspect of any marketing strategy or business model. The over-saturation of content on social media platforms has persuaded us to identify the important factors that affect content popularity. This comes from the fact that only an iota of the humongous content available online receive… ▽ More The increasing popularity of social media platforms makes it important to study user engagement, which is a crucial aspect of any marketing strategy or business model. The over-saturation of content on social media platforms has persuaded us to identify the important factors that affect content popularity. This comes from the fact that only an iota of the humongous content available online receives the attention of the target audience. Comprehensive research has been done in the area of popularity prediction using several Machine Learning techniques. However, we observe that there is still significant scope for improvement in analyzing the social importance of media content. We propose the DFW-PP framework, to learn the importance of different features that vary over time. Further, the proposed method controls the skewness of the distribution of the features by applying a log-log normalization. The proposed method is experimented with a benchmark dataset, to show promising results. The code will be made publicly available at https://github.com/chaitnayabasava/DFW-PP. △ Less

Submitted 16 October, 2021; originally announced October 2021.

arXiv:2109.02762 [pdf, other]

STRIVE: Scene Text Replacement In Videos

Authors: Vijay Kumar B G, Jeyasri Subramanian, Varnith Chordia, Eugene Bart, Shaobo Fang, Kelly Guan, Raja Bala

Abstract: We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by… ▽ More We propose replacing scene text in videos using deep style transfer and learned photometric transformations.Building on recent progress on still image text replacement,we present extensions that alter text while preserving the appearance and motion characteristics of the original video.Compared to the problem of still image text replacement,our method addresses additional challenges introduced by video, namely effects induced by changing lighting, motion blur, diverse variations in camera-object pose over time,and preservation of temporal consistency. We parse the problem into three steps. First, the text in all frames is normalized to a frontal pose using a spatio-temporal trans-former network. Second, the text is replaced in a single reference frame using a state-of-art still-image text replacement method. Finally, the new text is transferred from the reference to remaining frames using a novel learned image transformation network that captures lighting and blur effects in a temporally consistent manner. Results on synthetic and challenging real videos show realistic text trans-fer, competitive quantitative and qualitative performance,and superior inference speed relative to alternatives. We introduce new synthetic and real-world datasets with paired text objects. To the best of our knowledge this is the first attempt at deep video text replacement. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: ICCV 2021, Project Page: https://striveiccv2021.github.io/STRIVE-ICCV2021/

arXiv:1909.06734 [pdf]

A brief TOGAF description using SEMAT Essence Kernel

Authors: David C. Múnera, Fernán A. Villa G

Abstract: This work aims to explore the possibility of describing the enterprise architecture framework TOGAF using the Essence kernel SEMAT, see if they fit together, and if such marriage brings into lights any weaknesses of the models. This work aims to explore the possibility of describing the enterprise architecture framework TOGAF using the Essence kernel SEMAT, see if they fit together, and if such marriage brings into lights any weaknesses of the models. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1806.00911 [pdf, other]

Bayesian Semantic Instance Segmentation in Open Set World

Authors: Trung Pham, Vijay Kumar B G, Thanh-Toan Do, Gustavo Carneiro, Ian Reid

Abstract: This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase… ▽ More This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase boundlessly. In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes. We formulate the problem using a Bayesian framework, where the posterior distribution is approximated with a simulated annealing optimization equipped with an efficient image partition sampler. We show empirically that our method is competitive with state-of-the-art supervised methods on known classes, but also performs well on unknown classes when compared with unsupervised methods. △ Less

Submitted 29 July, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

Comments: Accepted to ECCV 2018

arXiv:1704.01285 [pdf, other]

Smart Mining for Deep Metric Learning

Authors: Ben Harwood, Vijay Kumar B G, Gustavo Carneiro, Ian Reid, Tom Drummond

Abstract: To solve deep metric learning problems and producing feature embeddings, current methodologies will commonly use a triplet model to minimise the relative distance between samples from the same class and maximise the relative distance between samples from different classes. Though successful, the training convergence of this triplet model can be compromised by the fact that the vast majority of the… ▽ More To solve deep metric learning problems and producing feature embeddings, current methodologies will commonly use a triplet model to minimise the relative distance between samples from the same class and maximise the relative distance between samples from different classes. Though successful, the training convergence of this triplet model can be compromised by the fact that the vast majority of the training samples will produce gradients with magnitudes that are close to zero. This issue has motivated the development of methods that explore the global structure of the embedding and other methods that explore hard negative/positive mining. The effectiveness of such mining methods is often associated with intractable computational requirements. In this paper, we propose a novel deep metric learning method that combines the triplet model and the global structure of the embedding space. We rely on a smart mining procedure that produces effective training samples for a low computational cost. In addition, we propose an adaptive controller that automatically adjusts the smart mining hyper-parameters and speeds up the convergence of the training process. We show empirically that our proposed method allows for fast and more accurate training of triplet ConvNets than other competing mining methods. Additionally, we show that our method achieves new state-of-the-art embedding results for CUB-200-2011 and Cars196 datasets. △ Less

Submitted 27 July, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

Comments: *Vijay Kumar B G and Ben Harwood contributed equally to this work. Accepted in IEEE International Conference on Computer Vision, ICCV 2017

arXiv:1611.08998 [pdf, other]

DeepSetNet: Predicting Sets with Deep Neural Networks

Authors: S. Hamid Rezatofighi, Vijay Kumar B G, Anton Milan, Ehsan Abbasnejad, Anthony Dick, Ian Reid

Abstract: This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for… ▽ More This paper addresses the task of set prediction using deep learning. This is important because the output of many computer vision tasks, including image tagging and object detection, are naturally expressed as sets of entities rather than vectors. As opposed to a vector, the size of a set is not fixed in advance, and it is invariant to the ordering of entities within it. We define a likelihood for a set distribution and learn its parameters using a deep neural network. We also derive a loss for predicting a discrete distribution corresponding to set cardinality. Set prediction is demonstrated on the problem of multi-class image classification. Moreover, we show that the proposed cardinality loss can also trivially be applied to the tasks of object counting and pedestrian detection. Our approach outperforms existing methods in all three cases on standard datasets. △ Less

Submitted 10 August, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

Comments: Accepted in IEEE International Conference on Computer Vision (ICCV), Venice, 2017, (Spotlight)

arXiv:1512.09272 [pdf, other]

Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions

Authors: Vijay Kumar B G, Gustavo Carneiro, Ian Reid

Abstract: Recent innovations in training deep convolutional neural network (ConvNet) models have motivated the design of new methods to automatically learn local image descriptors. The latest deep ConvNets proposed for this task consist of a siamese network that is trained by penalising misclassification of pairs of local image patches. Current results from machine learning show that replacing this siamese… ▽ More Recent innovations in training deep convolutional neural network (ConvNet) models have motivated the design of new methods to automatically learn local image descriptors. The latest deep ConvNets proposed for this task consist of a siamese network that is trained by penalising misclassification of pairs of local image patches. Current results from machine learning show that replacing this siamese by a triplet network can improve the classification accuracy in several problems, but this has yet to be demonstrated for local image descriptor learning. Moreover, current siamese and triplet networks have been trained with stochastic gradient descent that computes the gradient from individual pairs or triplets of local image patches, which can make them prone to overfitting. In this paper, we first propose the use of triplet networks for the problem of local image descriptor learning. Furthermore, we also propose the use of a global loss that minimises the overall classification error in the training set, which can improve the generalisation capability of the model. Using the UBC benchmark dataset for comparing local image descriptors, we show that the triplet network produces a more accurate embedding than the siamese network in terms of the UBC dataset errors. Moreover, we also demonstrate that a combination of the triplet and global losses produces the best embedding in the field, using this triplet network. Finally, we also show that the use of the central-surround siamese network trained with the global loss produces the best result of the field on the UBC dataset. Pre-trained models are available online at https://github.com/vijaykbg/deep-patchmatch △ Less

Submitted 1 August, 2016; v1 submitted 31 December, 2015; originally announced December 2015.

Comments: IEEE Conference on Computer Vision and Pattern Recognition 2016 (CVPR 2016)

arXiv:1506.01398 [pdf]

Recognition of Changes in SAR Images Based on Gauss-Log Ratio and MRFFCM

Authors: Jismy Alphonse, Biju V. G.

Abstract: A modified version of MRFFCM (Markov Random Field Fuzzy C means) based SAR (Synthetic aperture Radar) image change detection method is proposed in this paper. It involves three steps: Difference Image (DI) generation by using Gauss-log ratio operator, speckle noise reduction by SRAD (Speckle Reducing Anisotropic Diffusion), and the detection of changed regions by using MRFFCM. The proposed method… ▽ More A modified version of MRFFCM (Markov Random Field Fuzzy C means) based SAR (Synthetic aperture Radar) image change detection method is proposed in this paper. It involves three steps: Difference Image (DI) generation by using Gauss-log ratio operator, speckle noise reduction by SRAD (Speckle Reducing Anisotropic Diffusion), and the detection of changed regions by using MRFFCM. The proposed method is compared with existing methods such as FCM and MRFFCM using simulated and real SAR images. The measures used for evaluation includes Overall Error (OE), Percentage Correct Classification (PCC), Kappa Coefficient (KC), Root Mean Square Error (RMSE), and Peak Signal to Noise Ratio (PSNR). The results show that the proposed method is better compared to FCM and MRFFCM based change detection method. △ Less

Submitted 3 June, 2015; originally announced June 2015.

Comments: 7 pages, 7 figures, 2 tables in International Journal of advanced studies in Computer Science and Engineering (IJASCSE), ISSN : 2278 7917, Volume 4 Issue 5, 2015, www.ijascse.org

Report number: page 65-71

Showing 1–18 of 18 results for author: G, V