Search | arXiv e-print repository

Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD

Authors: Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau

Abstract: Recent studies of multi-modal Industrial Anomaly Detection (IAD) based on point clouds and RGB images indicated the importance of exploiting redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multi-modal IAD in practical production lines remains a work in progress that requires consideration of the trade-offs between costs and benefits… ▽ More Recent studies of multi-modal Industrial Anomaly Detection (IAD) based on point clouds and RGB images indicated the importance of exploiting redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multi-modal IAD in practical production lines remains a work in progress that requires consideration of the trade-offs between costs and benefits associated with introducing new modalities, while ensuring compatibility with current processes. Combining fast in-line inspections with high-resolution, time-consuming, near-line characterization techniques to enhance detection accuracy fits well into the existing quality control process, but only part of the samples can be tested with expensive near-line methods. Thus, the model must have the ability to leverage multi-modal training and handle incomplete modalities during inference. One solution is generating cross-modal hallucination to transfer knowledge among modalities for missing modality issues. In this paper, we propose CMDIAD, a Cross-Modal Distillation framework for IAD to demonstrate the feasibility of Multi-modal Training, Few-modal Inference pipeline. Moreover, we investigate reasons behind the asymmetric performance improvement using point clouds or RGB images as main modality of inference. This lays the foundation of our future multi-modal dataset construction for efficient IAD from manufacturing scenarios. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2403.15026 [pdf, other]

VRSO: Visual-Centric Reconstruction for Static Object Annotation

Authors: Chenyao Yu, Yingfeng Cai, Jiaxin Zhang, Hui Kong, Wei Sui, Cong Yang

Abstract: As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labeling over the dense LiDAR point clouds and reference images. T… ▽ More As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labeling over the dense LiDAR point clouds and reference images. Though most public driving datasets adopt this strategy to provide SOD ground truth (GT), it is still expensive (requires LiDAR scanners) and low-efficient (time-consuming and unscalable) in practice. This paper introduces VRSO, a visual-centric approach for static object annotation. VRSO is distinguished in low cost, high efficiency, and high quality: (1) It recovers static objects in 3D space with only camera images as input, and (2) manual labeling is barely involved since GT for SOD tasks is generated based on an automatic reconstruction and annotation pipeline. (3) Experiments on the Waymo Open Dataset show that the mean reprojection error from VRSO annotation is only 2.6 pixels, around four times lower than the Waymo labeling (10.6 pixels). Source code is available at: https://github.com/CaiYingFeng/VRSO. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: submitted to iros 2024

arXiv:2402.06854 [pdf, other]

Gyroscope-Assisted Motion Deblurring Network

Authors: Simin Luan, Cong Yang, Zeyd Boukhers, Xue Qin, Dongfeng Cheng, Wei Sui, Zhijun Li

Abstract: Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic a… ▽ More Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic and restore motion blur images using Inertial Measurement Unit (IMU) data. Notably, the framework includes a strategy for training triplet generation, and a Gyroscope-Aided Motion Deblurring (GAMD) network for blurred image restoration. The rationale is that through harnessing IMU data, we can determine the transformation of the camera pose during the image exposure phase, facilitating the deduction of the motion trajectory (aka. blur trajectory) for each point inside the three-dimensional space. Thus, the synthetic triplets using our strategy are inherently close to natural motion blur, strictly pixel-aligned, and mass-producible. Through comprehensive experiments, we demonstrate the advantages of the proposed framework: only two-pixel errors between our synthetic and real-world blur trajectories, a marked improvement (around 33.17%) of the state-of-the-art deblurring method MIMO on Peak Signal-to-Noise Ratio (PSNR). △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2309.11754 [pdf, other]

A Vision-Centric Approach for Static Map Element Annotation

Authors: Jiaxin Zhang, Shiyuan Chen, Haoran Yin, Ruohong Mei, Xuan Liu, Cong Yang, Qian Zhang, Wei Sui

Abstract: The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs… ▽ More The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, models trained with annotations from CAMA achieve lower reprojection errors (e.g., 4.73 vs. 8.03 pixels). △ Less

Submitted 16 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted at 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2306.11368 [pdf, other]

RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation

Authors: Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang

Abstract: In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational ef… ▽ More In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for autolabeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Published in: IEEE Transactions on Intelligent Vehicles

arXiv:2306.07467 [pdf, other]

ELF Codes: Concatenated Codes with an Expurgating Linear Function as the Outer Code

Authors: Richard Wesel, Amaael Antonini, Linfang Wang, Wenhui Sui, Brendan Towell, Holden Grissett

Abstract: An expurgating linear function (ELF) is a linear outer code that disallows the low-weight codewords of the inner code. ELFs can be designed either to maximize the minimum distance or to minimize the codeword error rate (CER) of the expurgated code. A list-decoding sieve of the inner code starting from the noiseless all-zeros codeword is an efficient way to identify ELFs that maximize the minimum d… ▽ More An expurgating linear function (ELF) is a linear outer code that disallows the low-weight codewords of the inner code. ELFs can be designed either to maximize the minimum distance or to minimize the codeword error rate (CER) of the expurgated code. A list-decoding sieve of the inner code starting from the noiseless all-zeros codeword is an efficient way to identify ELFs that maximize the minimum distance of the expurgated code. For convolutional inner codes, this paper provides distance spectrum union (DSU) upper bounds on the CER of the concatenated code. For short codeword lengths, ELFs transform a good inner code into a great concatenated code. For a constant message size of $K=64$ bits or constant codeword blocklength of $N=152$ bits, an ELF can reduce the gap at CER $10^{-6}$ between the DSU and the random-coding union (RCU) bounds from over 1 dB for the inner code alone to 0.23 dB for the concatenated code. The DSU bounds can also characterize puncturing that mitigates the rate overhead of the ELF while maintaining the DSU-to-RCU gap. The reduction in DSU-to-RCU gap comes with a minimal increase in average complexity at desired CER operating points. List Viterbi decoding guided by the ELF approaches maximum likelihood (ML) decoding of the concatenated code, and average list size converges to 1 as SNR increases. Thus, average complexity is similar to Viterbi decoding on the trellis of the inner code at high SNR. For rare large-magnitude noise events, which occur less often than the FER of the inner code, a deep search in the list finds the ML codeword. △ Less

Submitted 1 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 6 arXiv pages (actual ISTC paper is 5 pages with more compressed spacing), 6 figures, accepted to the 2023 International Symposium on Techniques in Coding. Latest version is Camera-Ready version for ISTC edited for clarity and to reflect reviewer suggestions and references were added

arXiv:2304.09807 [pdf, other]

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo… ▽ More High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value. Code: https://github.com/hustvl/VMA. △ Less

Submitted 27 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: https://github.com/hustvl/VMA

arXiv:2212.04224 [pdf, other]

doi 10.3390/s22239375

Towards Accurate Ground Plane Normal Estimation from Ego-Motion

Authors: Jiaxin Zhang, Wei Sui, Qian Zhang, Tao Chen, Cong Yang

Abstract: In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of va… ▽ More In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of various autonomous driving tasks (e.g., 3D object detection, road surface reconstruction, and trajectory planning). Our proposed method only uses odometry as input and estimates accurate ground plane normal vectors in real time. Particularly, it fully utilizes the underlying connection between the ego pose odometry (ego-motion) and its nearby ground plane. Built on that, an Invariant Extended Kalman Filter (IEKF) is designed to estimate the normal vector in the sensor's coordinate. Thus, our proposed method is simple yet efficient and supports both camera- and inertial-based odometry algorithms. Its usability and the marked improvement of robustness are validated through multiple experiments on public datasets. For instance, we achieve state-of-the-art accuracy on KITTI dataset with the estimated vector error of 0.39°. Our code is available at github.com/manymuch/ground_normal_filter. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Journal ref: Sensors 2022, 22(23), 9375;

arXiv:2212.04064 [pdf, ps, other]

CRC-Aided High-Rate Convolutional Codes With Short Blocklengths for List Decoding

Authors: Wenhui Sui, Brendan Towell, Ava Asmani, Hengjie Yang, Holden Grissett, Richard D. Wesel

Abstract: Recently, rate-1/n zero-terminated (ZT) and tail-biting (TB) convolutional codes (CCs) with cyclic redundancy check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRC polynomials for rate- (n-1)/n ZT and TB CCs with short blocklengths. This paper considers both standard rate-(n-1)/n CC polynomials and rat… ▽ More Recently, rate-1/n zero-terminated (ZT) and tail-biting (TB) convolutional codes (CCs) with cyclic redundancy check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRC polynomials for rate- (n-1)/n ZT and TB CCs with short blocklengths. This paper considers both standard rate-(n-1)/n CC polynomials and rate- (n-1)/n designs resulting from puncturing a rate-1/2 code. The CRC polynomials are chosen to maximize the minimum distance d_min and minimize the number of nearest neighbors A_(d_min) . For the standard rate-(n-1)/n codes, utilization of the dual trellis proposed by Yamada et al. lowers the complexity of CRC-aided serial list Viterbi decoding (SLVD). CRC-aided SLVD of the TBCCs closely approaches the RCU bound at a blocklength of 128. This paper compares the FER performance (gap to the RCU bound) and complexity of the CRC-aided standard and punctured ZTCCs and TBCCs. This paper also explores the complexity-performance trade-off for three TBCC decoders: a single-trellis approach, a multi-trellis approach, and a modified single-trellis approach with pre-processing using the wrap around Viterbi algorithm. △ Less

Submitted 9 October, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2111.07929

arXiv:2112.08635 [pdf, other]

Road-aware Monocular Structure from Motion and Homography Estimation

Authors: Wei Sui, Teng Chen, Jiaxin Zhang, Jiao Lu, Qian Zhang

Abstract: Structure from motion (SFM) and ground plane homography estimation are critical to autonomous driving and other robotics applications. Recently, much progress has been made in using deep neural networks for SFM and homography estimation respectively. However, directly applying existing methods for ground plane homography estimation may fail because the road is often a small part of the scene. Besi… ▽ More Structure from motion (SFM) and ground plane homography estimation are critical to autonomous driving and other robotics applications. Recently, much progress has been made in using deep neural networks for SFM and homography estimation respectively. However, directly applying existing methods for ground plane homography estimation may fail because the road is often a small part of the scene. Besides, the performances of deep SFM approaches are still inferior to traditional methods. In this paper, we propose a method that learns to solve both problems in an end-to-end manner, improving performance on both. The proposed networks consist of a Depth-CNN, a Pose-CNN and a Ground-CNN. The Depth-CNN and Pose-CNN estimate dense depth map and ego-motion respectively, solving SFM, while the Pose-CNN and Ground-CNN followed by a homography layer solve the ground plane estimation problem. By enforcing coherency between SFM and homography estimation results, the whole network can be trained end to end using photometric loss and homography loss without any groundtruth except the road segmentation provided by an off-the-shelf segmenter. Comprehensive experiments are conducted on KITTI benchmark to demonstrate promising results compared with various state-of-the-art approaches. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 10 pages

arXiv:2111.11089 [pdf, other]

doi 10.1109/TIP.2023.3289323

Monocular Road Planar Parallax Estimation

Authors: Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, Qian Zhang

Abstract: Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following ex… ▽ More Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $γ$ map (the ratio of height to depth) for 3D reconstruction. The $γ$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by war** the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios. △ Less

Submitted 9 July, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted by IEEE TIP

arXiv:2111.07929 [pdf, other]

High-Rate Convolutional Codes with CRC-Aided List Decoding for Short Blocklengths

Authors: Wenhui Sui, Hengjie Yang, Brendan Towell, Ava Asmani, Richard D. Wesel

Abstract: Recently, rate-$1/ω$ zero-terminated and tail-biting convolutional codes (ZTCCs and TBCCs) with cyclic-redundancy-check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRCs for rate-$(ω-1)/ω$ CCs with short blocklengths, considering both the ZT and TB cases. The CRC design seeks to optimize the frame error… ▽ More Recently, rate-$1/ω$ zero-terminated and tail-biting convolutional codes (ZTCCs and TBCCs) with cyclic-redundancy-check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRCs for rate-$(ω-1)/ω$ CCs with short blocklengths, considering both the ZT and TB cases. The CRC design seeks to optimize the frame error rate (FER) performance of the code resulting from the concatenation of the CRC and the CC. Utilization of the dual trellis proposed by Yamada \emph{et al.} lowers the complexity of CRC-aided serial list Viterbi decoding (SLVD) of ZTCCs and TBCCs. CRC-aided SLVD of the TBCCs closely approaches the RCU bound at a blocklength of $128$. △ Less

Submitted 7 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 6 pages; submitted to 2022 IEEE International Conference on Communications (ICC 2022)

arXiv:2103.10029 [pdf, other]

doi 10.1109/ICRA48506.2021.9561642

Deep Online Correction for Monocular Visual Odometry

Authors: Jiaxin Zhang, Wei Sui, Xinggang Wang, Wenming Meng, Hongmei Zhu, Qian Zhang

Abstract: In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inferenc… ▽ More In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases. The benefits of our proposed method are twofold: 1) Different from online-learning methods, DOC does not need to calculate gradient propagation for parameters of CNNs. Thus, it saves more computation resources during inference phases. 2) Unlike hybrid methods that combine CNNs with traditional methods, DOC fully relies on deep learning (DL) frameworks. Though without complex back-end optimization modules, our method achieves outstanding performance with relative transform error (RTE) = 2.0% on KITTI Odometry benchmark for Seq. 09, which outperforms traditional monocular VO frameworks and is comparable to hybrid methods. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:1811.08611 [pdf, other]

A Novel Integrated Framework for Learning both Text Detection and Recognition

Authors: Wanchen Sui, Qing Zhang, Jun Yang, Wei Chu

Abstract: In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast t… ▽ More In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast to those methods, by sharing model parameters, we merge the detection model and recognition model into a single end-to-end trainable model and train the joint model for two tasks simultaneously. The shared parameters not only help effectively reduce the computational load in inference process, but also improve the end-to-end text detection-recognition accuracy. In addition, we design a simpler and faster sequence learning method for the recognition network based on a succession of stacked convolutional layers without any recurrent structure, this is proved feasible and dramatically improves inference speed. Extensive experiments on different datasets demonstrate that the proposed method achieves very promising results. △ Less

Submitted 21 November, 2018; originally announced November 2018.

arXiv:1602.04502 [pdf, other]

Do We Need Binary Features for 3D Reconstruction?

Authors: Bin Fan, Qingqun Kong, Wei Sui, Zhiheng Wang, Xinchao Wang, Shiming Xiang, Chunhong Pan, Pascal Fua

Abstract: Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors. They have been shown with promising results on some real time applications, e.g., SLAM, where the matching operations are relative few. However, in computer vision, there are many applications such as 3D reconstructio… ▽ More Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors. They have been shown with promising results on some real time applications, e.g., SLAM, where the matching operations are relative few. However, in computer vision, there are many applications such as 3D reconstruction requiring lots of matching operations between local features. Therefore, a natural question is that is the binary feature still a promising solution to this kind of applications? To get the answer, this paper conducts a comparative study of binary features and their matching methods on the context of 3D reconstruction in a recently proposed large scale mutliview stereo dataset. Our evaluations reveal that not all binary features are capable of this task. Most of them are inferior to the classical SIFT based method in terms of reconstruction accuracy and completeness with a not significant better computational performance. △ Less

Submitted 14 February, 2016; originally announced February 2016.

Showing 1–15 of 15 results for author: Sui, W