-
Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method
Authors:
Mrinal Haloi,
Shashank Shekhar,
Nikhil Fande,
Siddhant Swaroop Dash,
Sanjay G
Abstract:
Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table det…
▽ More
Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of samples diversity, simple table structure, the lack of training cases, and samples quality. In this paper, we introduce a diverse large-scale dataset for table detection with more than seven thousand samples containing a wide variety of table structures collected from many diverse sources. In addition to that, we also present baseline results using a convolutional neural network-based method to detect table structure in documents. Experimental results show the superiority of applying convolutional deep learning methods over classical computer vision-based methods. The introduction of this diverse table detection dataset will enable the community to develop high throughput deep learning methods for understanding document layout and tabular data processing. Dataset is available at: 1. https://www.kaggle.com/datasets/mrinalim/stdw-dataset 2. https://huggingface.co/datasets/n3011/STDW
△ Less
Submitted 30 November, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
An Ensemble Model for Face Liveness Detection
Authors:
Shashank Shekhar,
Avinash Patel,
Mrinal Haloi,
Asif Salim
Abstract:
In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by…
▽ More
In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by several means, for example, they can capture a user photo from social media and do an imposter attack using printouts of users faces or using a digital photo from a mobile device and even create a more sophisticated attack like video replay attack. We have tried to understand the different methods of attack and created an in-house large-scale dataset covering all the kinds of attacks to train a robust deep learning model. We propose an ensemble method where multiple features of the face and background regions are learned to predict whether the user is a bonafide or an attacker.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Towards Ophthalmologist Level Accurate Deep Learning System for OCT Screening and Diagnosis
Authors:
Mrinal Haloi
Abstract:
In this work, we propose an advanced AI based grading system for OCT images. The proposed system is a very deep fully convolutional attentive classification network trained with end to end advanced transfer learning with online random augmentation. It uses quasi random augmentation that outputs confidence values for diseases prevalence during inference. Its a fully automated retinal OCT analysis A…
▽ More
In this work, we propose an advanced AI based grading system for OCT images. The proposed system is a very deep fully convolutional attentive classification network trained with end to end advanced transfer learning with online random augmentation. It uses quasi random augmentation that outputs confidence values for diseases prevalence during inference. Its a fully automated retinal OCT analysis AI system capable of pathological lesions understanding without any offline preprocessing/postprocessing step or manual feature extraction. We present a state of the art performance on the publicly available Mendeley OCT dataset.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Towards Radiologist-Level Accurate Deep Learning System for Pulmonary Screening
Authors:
Mrinal Haloi,
K. Raja Rajalakshmi,
Pradeep Walia
Abstract:
In this work, we propose advanced pneumonia and Tuberculosis grading system for X-ray images. The proposed system is a very deep fully convolutional classification network with online augmentation that outputs confidence values for diseases prevalence. Its a fully automated system capable of disease feature understanding without any offline preprocessing step or manual feature extraction. We have…
▽ More
In this work, we propose advanced pneumonia and Tuberculosis grading system for X-ray images. The proposed system is a very deep fully convolutional classification network with online augmentation that outputs confidence values for diseases prevalence. Its a fully automated system capable of disease feature understanding without any offline preprocessing step or manual feature extraction. We have achieved state- of-the- art performance on the public databases such as ChestXray-14, Mendeley, Shenzhen Hospital X-ray and Belarus X-ray set.
△ Less
Submitted 25 June, 2018;
originally announced July 2018.
-
Rethinking Convolutional Semantic Segmentation Learning
Authors:
Mrinal Haloi
Abstract:
Deep convolutional semantic segmentation (DCSS) learning doesn't converge to an optimal local minimum with random parameters initializations; a pre-trained model on the same domain becomes necessary to achieve convergence.In this work, we propose a joint cooperative end-to-end learning method for DCSS. It addresses many drawbacks with existing deep semantic segmentation learning; the proposed appr…
▽ More
Deep convolutional semantic segmentation (DCSS) learning doesn't converge to an optimal local minimum with random parameters initializations; a pre-trained model on the same domain becomes necessary to achieve convergence.In this work, we propose a joint cooperative end-to-end learning method for DCSS. It addresses many drawbacks with existing deep semantic segmentation learning; the proposed approach simultaneously learn both segmentation and classification; taking away the essential need of the pre-trained model for learning convergence. We present an improved inception based architecture with partial attention gating (PAG) over encoder information. The PAG also adds to achieve faster convergence and better accuracy for segmentation task. We will show the effectiveness of this learning on a diabetic retinopathy classification and segmentation dataset.
△ Less
Submitted 22 October, 2017;
originally announced October 2017.
-
Deep Learning: Generalization Requires Deep Compositional Feature Space Design
Authors:
Mrinal Haloi
Abstract:
Generalization error defines the discriminability and the representation power of a deep model. In this work, we claim that feature space design using deep compositional function plays a significant role in generalization along with explicit and implicit regularizations. Our claims are being established with several image classification experiments. We show that the information loss due to convolu…
▽ More
Generalization error defines the discriminability and the representation power of a deep model. In this work, we claim that feature space design using deep compositional function plays a significant role in generalization along with explicit and implicit regularizations. Our claims are being established with several image classification experiments. We show that the information loss due to convolution and max pooling can be marginalized with the compositional design, improving generalization performance. Also, we will show that learning rate decay acts as an implicit regularizer in deep model training.
△ Less
Submitted 8 July, 2017; v1 submitted 6 June, 2017;
originally announced June 2017.
-
Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification
Authors:
Rahul Rama Varior,
Mrinal Haloi,
Gang Wang
Abstract:
Matching pedestrians across multiple camera views, known as human re-identification, is a challenging research problem that has numerous applications in visual surveillance. With the resurgence of Convolutional Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have been proposed for human re-identification with the objective of projecting the images of similar pairs (i.e. s…
▽ More
Matching pedestrians across multiple camera views, known as human re-identification, is a challenging research problem that has numerous applications in visual surveillance. With the resurgence of Convolutional Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have been proposed for human re-identification with the objective of projecting the images of similar pairs (i.e. same identity) to be closer to each other and those of dissimilar pairs to be distant from each other. However, current networks extract fixed representations for each image regardless of other images which are paired with it and the comparison with other images is done only at the final level. In this setting, the network is at risk of failing to extract finer local patterns that may be essential to distinguish positive pairs from hard negative pairs. In this paper, we propose a gating function to selectively emphasize such fine common local patterns by comparing the mid-level features across pairs of images. This produces flexible representations for the same image according to the images they are paired with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and demonstrate improved performance compared to a baseline Siamese CNN architecture.
△ Less
Submitted 26 September, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.
-
An Unsupervised Method for Detection and Validation of The Optic Disc and The Fovea
Authors:
Mrinal Haloi,
Samarendra Dandapat,
Rohit Sinha
Abstract:
In this work, we have presented a novel method for detection of retinal image features, the optic disc and the fovea, from colour fundus photographs of dilated eyes for Computer-aided Diagnosis(CAD) system. A saliency map based method was used to detect the optic disc followed by an unsupervised probabilistic Latent Semantic Analysis for detection validation. The validation concept is based on dis…
▽ More
In this work, we have presented a novel method for detection of retinal image features, the optic disc and the fovea, from colour fundus photographs of dilated eyes for Computer-aided Diagnosis(CAD) system. A saliency map based method was used to detect the optic disc followed by an unsupervised probabilistic Latent Semantic Analysis for detection validation. The validation concept is based on distinct vessels structures in the optic disc. By using the clinical information of standard location of the fovea with respect to the optic disc, the macula region is estimated. Accuracy of 100\% detection is achieved for the optic disc and the macula on MESSIDOR and DIARETDB1 and 98.8\% detection accuracy on STARE dataset.
△ Less
Submitted 25 January, 2016;
originally announced January 2016.
-
Traffic Sign Classification Using Deep Inception Based Convolutional Networks
Authors:
Mrinal Haloi
Abstract:
In this work, we propose a novel deep network for traffic sign classification that achieves outstanding performance on GTSRB surpassing all previous methods. Our deep network consists of spatial transformer layers and a modified version of inception module specifically designed for capturing local and global features together. This features adoption allows our network to classify precisely intracl…
▽ More
In this work, we propose a novel deep network for traffic sign classification that achieves outstanding performance on GTSRB surpassing all previous methods. Our deep network consists of spatial transformer layers and a modified version of inception module specifically designed for capturing local and global features together. This features adoption allows our network to classify precisely intraclass samples even under deformations. Use of spatial transformer layer makes this network more robust to deformations such as translation, rotation, scaling of input images. Unlike existing approaches that are developed with hand-crafted features, multiple deep networks with huge parameters and data augmentations, our method addresses the concern of exploding parameters and augmentations. We have achieved the state-of-the-art performance of 99.81\% on GTSRB dataset.
△ Less
Submitted 17 July, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.
-
Improved Microaneurysm Detection using Deep Neural Networks
Authors:
Mrinal Haloi
Abstract:
In this work, we propose a novel microaneurysm (MA) detection for early diabetic retinopathy screening using color fundus images. Since MA usually the first lesions to appear as an indicator of diabetic retinopathy, accurate detection of MA is necessary for treatment. Each pixel of the image is classified as either MA or non-MA using a deep neural network with dropout training procedure using maxo…
▽ More
In this work, we propose a novel microaneurysm (MA) detection for early diabetic retinopathy screening using color fundus images. Since MA usually the first lesions to appear as an indicator of diabetic retinopathy, accurate detection of MA is necessary for treatment. Each pixel of the image is classified as either MA or non-MA using a deep neural network with dropout training procedure using maxout activation function. No preprocessing step or manual feature extraction is required. Substantial improvements over standard MA detection method based on the pipeline of preprocessing, feature extraction, classification followed by post processing is achieved. The presented method is evaluated in publicly available Retinopathy Online Challenge (ROC) and Diaretdb1v2 database and achieved state-of-the-art accuracy.
△ Less
Submitted 17 July, 2016; v1 submitted 17 May, 2015;
originally announced May 2015.
-
A Gaussian Scale Space Approach For Exudates Detection, Classification And Severity Prediction
Authors:
Mrinal Haloi,
Samarendra Dandapat,
Rohit Sinha
Abstract:
In the context of Computer Aided Diagnosis system for diabetic retinopathy, we present a novel method for detection of exudates and their classification for disease severity prediction. The method is based on Gaussian scale space based interest map and mathematical morphology. It makes use of support vector machine for classification and location information of the optic disc and the macula region…
▽ More
In the context of Computer Aided Diagnosis system for diabetic retinopathy, we present a novel method for detection of exudates and their classification for disease severity prediction. The method is based on Gaussian scale space based interest map and mathematical morphology. It makes use of support vector machine for classification and location information of the optic disc and the macula region for severity prediction. It can efficiently handle luminance variation and it is suitable for varied sized exudates. The method has been probed in publicly available DIARETDB1V2 and e-ophthaEX databases. For exudate detection the proposed method achieved a sensitivity of 96.54% and prediction of 98.35% in DIARETDB1V2 database.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
A Robust Lane Detection and Departure Warning System
Authors:
Mrinal Haloi,
Dinesh Babu Jayagopi
Abstract:
In this work, we have developed a robust lane detection and departure warning technique. Our system is based on single camera sensor. For lane detection a modified Inverse Perspective Map** using only a few extrinsic camera parameters and illuminant Invariant techniques is used. Lane markings are represented using a combination of 2nd and 4th order steerable filters, robust to shadowing. Effect…
▽ More
In this work, we have developed a robust lane detection and departure warning technique. Our system is based on single camera sensor. For lane detection a modified Inverse Perspective Map** using only a few extrinsic camera parameters and illuminant Invariant techniques is used. Lane markings are represented using a combination of 2nd and 4th order steerable filters, robust to shadowing. Effect of shadowing and extra sun light are removed using Lab color space, and illuminant invariant representation. Lanes are assumed to be cubic curves and fitted using robust RANSAC. This method can reliably detect lanes of the road and its boundary. This method has been experimented in Indian road conditions under different challenging situations and the result obtained were very good. For lane departure angle an optical flow based method were used.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Vehicle Local Position Estimation System
Authors:
Mrinal Haloi,
Dinesh Babu Jayagopi
Abstract:
In this paper, a robust vehicle local position estimation with the help of single camera sensor and GPS is presented. A modified Inverse Perspective Map**, illuminant Invariant techniques and object detection based approach is used to localize the vehicle in the road. Vehicles current lane, its position from road boundary and other cars are used to define its local position. For this purpose Lan…
▽ More
In this paper, a robust vehicle local position estimation with the help of single camera sensor and GPS is presented. A modified Inverse Perspective Map**, illuminant Invariant techniques and object detection based approach is used to localize the vehicle in the road. Vehicles current lane, its position from road boundary and other cars are used to define its local position. For this purpose Lane markings are detected using a Laplacian edge feature, robust to shadowing. Effect of shadowing and extra sun light are removed using Lab color space and illuminant invariant techniques. Lanes are assumed to be as parabolic model and fitted using robust RANSAC. This method can reliably detect all lanes of the road, estimate lane departure angle and local position of vehicle relative to lanes, road boundary and other cars. Different type of obstacle like pedestrians, vehicles are detected using HOG feature based deformable part model.
△ Less
Submitted 23 March, 2015;
originally announced March 2015.
-
A novel pLSA based Traffic Signs Classification System
Authors:
Mrinal Haloi
Abstract:
In this work we developed a novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving. Traffic signs play a very vital role in safe driving and avoiding accident. We have used image processing and topic discovery model pLSA to tackle this challenging multiclass classification problem. Our algorithm is consist of two parts…
▽ More
In this work we developed a novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving. Traffic signs play a very vital role in safe driving and avoiding accident. We have used image processing and topic discovery model pLSA to tackle this challenging multiclass classification problem. Our algorithm is consist of two parts, shape classification and sign classification for improved accuracy. For processing and representation of image we have used bag of features model with SIFT local descriptor. Where a visual vocabulary of size 300 words are formed using k-means codebook formation algorithm. We exploited the concept that every image is a collection of visual topics and images having same topics will belong to same category. Our algorithm is tested on German traffic sign recognition benchmark (GTSRB) and gives very promising result near to existing state of the art techniques.
△ Less
Submitted 23 March, 2015;
originally announced March 2015.
-
Characterizing driving behavior using automatic visual analysis
Authors:
Mrinal Haloi,
Dinesh Babu Jayagopi
Abstract:
In this work, we present the problem of rash driving detection algorithm using a single wide angle camera sensor, particularly useful in the Indian context. To our knowledge this rash driving problem has not been addressed using Image processing techniques (existing works use other sensors such as accelerometer). Car Image processing literature, though rich and mature, does not address the rash dr…
▽ More
In this work, we present the problem of rash driving detection algorithm using a single wide angle camera sensor, particularly useful in the Indian context. To our knowledge this rash driving problem has not been addressed using Image processing techniques (existing works use other sensors such as accelerometer). Car Image processing literature, though rich and mature, does not address the rash driving problem. In this work-in-progress paper, we present the need to address this problem, our approach and our future plans to build a rash driving detector.
△ Less
Submitted 13 March, 2015;
originally announced March 2015.