-
Towards Total Recall in Industrial Anomaly Detection
Authors:
Karsten Roth,
Latha Pemula,
Joaquin Zepeda,
Bernhard Schölkopf,
Thomas Brox,
Peter Gehler
Abstract:
Being able to spot defective parts is a critical component in large-scale industrial manufacturing. A particular challenge that we address in this work is the cold-start problem: fit a model using nominal (non-defective) example images only. While handcrafted solutions per class are possible, the goal is to build systems that work well simultaneously on many different tasks automatically. The best…
▽ More
Being able to spot defective parts is a critical component in large-scale industrial manufacturing. A particular challenge that we address in this work is the cold-start problem: fit a model using nominal (non-defective) example images only. While handcrafted solutions per class are possible, the goal is to build systems that work well simultaneously on many different tasks automatically. The best performing approaches combine embeddings from ImageNet models with an outlier detection model. In this paper, we extend on this line of work and propose \textbf{PatchCore}, which uses a maximally representative memory bank of nominal patch-features. PatchCore offers competitive inference times while achieving state-of-the-art performance for both detection and localization. On the challenging, widely used MVTec AD benchmark PatchCore achieves an image-level anomaly detection AUROC score of up to $99.6\%$, more than halving the error compared to the next best competitor. We further report competitive results on two additional datasets and also find competitive results in the few samples regime.\freefootnote{$^*$ Work done during a research internship at Amazon AWS.} Code: github.com/amazon-research/patchcore-inspection.
△ Less
Submitted 5 May, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Geometric Proxies for Live RGB-D Stream Enhancement and Consolidation
Authors:
Adrien Kaiser,
José Alonso Ybanez Zepeda,
Tamy Boubekeur
Abstract:
We propose a geometric superstructure for unified real-time processing of RGB-D data. Modern RGB-D sensors are widely used for indoor 3D capture, with applications ranging from modeling to robotics, through augmented reality. Nevertheless, their use is limited by their low resolution, with frames often corrupted with noise, missing data and temporal inconsistencies. Our approach consists in genera…
▽ More
We propose a geometric superstructure for unified real-time processing of RGB-D data. Modern RGB-D sensors are widely used for indoor 3D capture, with applications ranging from modeling to robotics, through augmented reality. Nevertheless, their use is limited by their low resolution, with frames often corrupted with noise, missing data and temporal inconsistencies. Our approach consists in generating and updating through time a single set of compact local statistics parameterized over detected geometric proxies, which are fed from raw RGB-D data. Our proxies provide several processing primitives, which improve the quality of the RGB-D stream on the fly or lighten further operations. Experimental results confirm that our lightweight analysis framework copes well with embedded execution as well as moderate memory and computational capabilities compared to state-of-the-art methods. Processing RGB-D data with our proxies allows noise and temporal flickering removal, hole filling and resampling. As a substitute of the observed scene, our proxies can additionally be applied to compression and scene reconstruction. We present experiments performed with our framework in indoor scenes of different natures within a recent open RGB-D dataset.
△ Less
Submitted 21 January, 2020;
originally announced January 2020.
-
Plane Pair Matching for Efficient 3D View Registration
Authors:
Adrien Kaiser,
José Alonso Ybanez Zepeda,
Tamy Boubekeur
Abstract:
We present a novel method to estimate the motion matrix between overlap** pairs of 3D views in the context of indoor scenes. We use the Manhattan world assumption to introduce lightweight geometric constraints under the form of planes into the problem, which reduces complexity by taking into account the structure of the scene. In particular, we define a stochastic framework to categorize planes…
▽ More
We present a novel method to estimate the motion matrix between overlap** pairs of 3D views in the context of indoor scenes. We use the Manhattan world assumption to introduce lightweight geometric constraints under the form of planes into the problem, which reduces complexity by taking into account the structure of the scene. In particular, we define a stochastic framework to categorize planes as vertical or horizontal and parallel or non-parallel. We leverage this classification to match pairs of planes in overlap** views with point-of-view agnostic structural metrics. We propose to split the motion computation using the classification and estimate separately the rotation and translation of the sensor, using a quadric minimizer. We validate our approach on a toy example and present quantitative experiments on a public RGB-D dataset, comparing against recent state-of-the-art methods. Our evaluation shows that planar constraints only add low computational overhead while improving results in precision when applied after a prior coarse estimate. We conclude by giving hints towards extensions and improvements of current results.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
Vine Robots: Design, Teleoperation, and Deployment for Navigation and Exploration
Authors:
Margaret M. Coad,
Laura H. Blumenschein,
Sadie Cutler,
Javier A. Reyna Zepeda,
Nicholas D. Naclerio,
Haitham El-Hussieny,
Usman Mehmood,
Jee-Hwan Ryu,
Elliot W. Hawkes,
Allison M. Okamura
Abstract:
A new class of continuum robots has recently been explored, characterized by tip extension, significant length change, and directional control. Here, we call this class of robots "vine robots," due to their similar behavior to plants with the growth habit of trailing. Due to their growth-based movement, vine robots are well suited for navigation and exploration in cluttered environments, but until…
▽ More
A new class of continuum robots has recently been explored, characterized by tip extension, significant length change, and directional control. Here, we call this class of robots "vine robots," due to their similar behavior to plants with the growth habit of trailing. Due to their growth-based movement, vine robots are well suited for navigation and exploration in cluttered environments, but until now, they have not been deployed outside the lab. Portability of these robots and steerability at length scales relevant for navigation are key to field applications. In addition, intuitive human-in-the-loop teleoperation enables movement in unknown and dynamic environments. We present a vine robot system that is teleoperated using a custom designed flexible joystick and camera system, long enough for use in navigation tasks, and portable for use in the field. We report on deployment of this system in two scenarios: a soft robot navigation competition and exploration of an archaeological site. The competition course required movement over uneven terrain, past unstable obstacles, and through a small aperture. The archaeological site required movement over rocks and through horizontal and vertical turns. The robot tip successfully moved past the obstacles and through the tunnels, demonstrating the capability of vine robots to achieve navigation and exploration tasks in the field.
△ Less
Submitted 6 January, 2020; v1 submitted 28 February, 2019;
originally announced March 2019.
-
Learning a Complete Image Indexing Pipeline
Authors:
Himalaya Jain,
Joaquin Zepeda,
Patrick Pérez,
Rémi Gribonval
Abstract:
To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on u…
▽ More
To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding.
△ Less
Submitted 12 December, 2017;
originally announced December 2017.
-
SUBIC: A supervised, structured binary code for image search
Authors:
Himalaya Jain,
Joaquin Zepeda,
Patrick Pérez,
Rémi Gribonval
Abstract:
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and…
▽ More
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and novel architectures ushered in by the deep learning revolution. We hence propose herein a novel method to make deep convolutional neural networks produce supervised, compact, structured binary codes for visual search. Our method makes use of a novel block-softmax non-linearity and of batch-based entropy losses that together induce structure in the learned encodings. We show that our method outperforms state-of-the-art compact representations based on deep hashing or structured quantization in single and cross-domain category retrieval, instance retrieval and classification. We make our code and models publicly available online.
△ Less
Submitted 9 August, 2017;
originally announced August 2017.
-
Approximate search with quantized sparse representations
Authors:
Himalaya Jain,
Patrick Pérez,
Rémi Gribonval,
Joaquin Zepeda,
Hervé Jégou
Abstract:
This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are restricted to belong to a finite subset. This formulation encompasses, as particular cases, previous state-of-the-art methods such as product or residual quanti…
▽ More
This paper tackles the task of storing a large collection of vectors, such as visual descriptors, and of searching in it. To this end, we propose to approximate database vectors by constrained sparse coding, where possible atom weights are restricted to belong to a finite subset. This formulation encompasses, as particular cases, previous state-of-the-art methods such as product or residual quantization. As opposed to traditional sparse coding methods, quantized sparse coding includes memory usage as a design constraint, thereby allowing us to index a large collection such as the BIGANN billion-sized benchmark. Our experiments, carried out on standard benchmarks, show that our formulation leads to competitive solutions when considering different trade-offs between learning/coding time, index size and search quality.
△ Less
Submitted 10 August, 2016;
originally announced August 2016.
-
Hybrid multi-layer Deep CNN/Aggregator feature for image classification
Authors:
Praveen Kulkarni,
Joaquin Zepeda,
Frederic Jurie,
Patrick Perez,
Louis Chevallier
Abstract:
Deep Convolutional Neural Networks (DCNN) have established a remarkable performance benchmark in the field of image classification, displacing classical approaches based on hand-tailored aggregations of local descriptors. Yet DCNNs impose high computational burdens both at training and at testing time, and training them requires collecting and annotating large amounts of training data. Supervised…
▽ More
Deep Convolutional Neural Networks (DCNN) have established a remarkable performance benchmark in the field of image classification, displacing classical approaches based on hand-tailored aggregations of local descriptors. Yet DCNNs impose high computational burdens both at training and at testing time, and training them requires collecting and annotating large amounts of training data. Supervised adaptation methods have been proposed in the literature that partially re-learn a transferred DCNN structure from a new target dataset. Yet these require expensive bounding-box annotations and are still computationally expensive to learn. In this paper, we address these shortcomings of DCNN adaptation schemes by proposing a hybrid approach that combines conventional, unsupervised aggregators such as Bag-of-Words (BoW), with the DCNN pipeline by treating the output of intermediate layers as densely extracted local descriptors.
We test a variant of our approach that uses only intermediate DCNN layers on the standard PASCAL VOC 2007 dataset and show performance significantly higher than the standard BoW model and comparable to Fisher vector aggregation but with a feature that is 150 times smaller. A second variant of our approach that includes the fully connected DCNN layers significantly outperforms Fisher vector schemes and performs comparably to DCNN approaches adapted to Pascal VOC 2007, yet at only a small fraction of the training and testing cost.
△ Less
Submitted 13 March, 2015;
originally announced March 2015.