-
TLU-Net: A Deep Learning Approach for Automatic Steel Surface Defect Detection
Authors:
Praveen Damacharla,
Achuth Rao M. V.,
Jordan Ringenberg,
Ahmad Y Javaid
Abstract:
Visual steel surface defect detection is an essential step in steel sheet manufacturing. Several machine learning-based automated visual inspection (AVI) methods have been studied in recent years. However, most steel manufacturing industries still use manual visual inspection due to training time and inaccuracies involved with AVI methods. Automatic steel defect detection methods could be useful i…
▽ More
Visual steel surface defect detection is an essential step in steel sheet manufacturing. Several machine learning-based automated visual inspection (AVI) methods have been studied in recent years. However, most steel manufacturing industries still use manual visual inspection due to training time and inaccuracies involved with AVI methods. Automatic steel defect detection methods could be useful in less expensive and faster quality control and feedback. But preparing the annotated training data for segmentation and classification could be a costly process. In this work, we propose to use the Transfer Learning-based U-Net (TLU-Net) framework for steel surface defect detection. We use a U-Net architecture as the base and explore two kinds of encoders: ResNet and DenseNet. We compare these nets' performance using random initialization and the pre-trained networks trained using the ImageNet data set. The experiments are performed using Severstal data. The results demonstrate that the transfer learning performs 5% (absolute) better than that of the random initialization in defect classification. We found that the transfer learning performs 26% (relative) better than that of the random initialization in defect segmentation. We also found the gain of transfer learning increases as the training data decreases, and the convergence rate with transfer learning is better than that of the random initialization.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Authors:
Jogendra Nath Kundu,
Rahul M. V.,
Aditya Ganeshan,
R. Venkatesh Babu
Abstract:
Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In…
▽ More
Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence maps for a given monocular real image. Finally, a pose estimation network predicts 3D pose of the object using these correspondence maps. This pipeline is further extended to a multi-view approach, which assimilates keypoint information from correspondence sets generated from multiple views of the 3D template model. Fusion of multi-view information significantly improves geometric comprehension of the system which in turn enhances the pose estimation performance. Furthermore, use of correspondence framework responsible for the learning of pose invariant keypoint descriptor also allows us to effectively alleviate the data-scarcity problem. This enables our method to achieve state-of-the-art performance on multiple real-image viewpoint estimation datasets, such as Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released the codes for our proposed approach.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
iSPA-Net: Iterative Semantic Pose Alignment Network
Authors:
Jogendra Nath Kundu,
Aditya Ganeshan,
Rahul M. V.,
Aditya Prakash,
R. Venkatesh Babu
Abstract:
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural refere…
▽ More
Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also alleviates the problem of data scarcity and hence enhances scalability of the proposed approach for novel object categories with minimal annotation. The fine-grained object pose estimator is also aided by correspondence of learned spatial descriptor of the input image pair. The proposed pose alignment framework enjoys the faculty to refine its initial pose estimation in consecutive iterations by utilizing an online rendering setup along with effectiveness of a non-uniform bin classification of pose-difference. This enables iSPA-Net to achieve state-of-the-art performance on various real image viewpoint estimation datasets. Further, we demonstrate effectiveness of the approach for multiple applications. First, we show results for active object viewpoint localization to capture images from similar pose considering only a single image as pose reference. Second, we demonstrate the ability of the learned semantic correspondence to perform unsupervised part-segmentation transfer using only a single part-annotated 3D template model per object class. To encourage reproducible research, we have released the codes for our proposed algorithm.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.