-
Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Authors:
Ricardo Knauer,
Erik Rodner
Abstract:
Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs…
▽ More
Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
Authors:
Ricardo Knauer,
Erik Rodner
Abstract:
A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into acco…
▽ More
A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
CAD Models to Real-World Images: A Practical Approach to Unsupervised Domain Adaptation in Industrial Object Classification
Authors:
Dennis Ritter,
Mike Hemberger,
Marc Hönig,
Volker Stopp,
Erik Rodner,
Kristian Hildebrand
Abstract:
In this paper, we systematically analyze unsupervised domain adaptation pipelines for object classification in a challenging industrial setting. In contrast to standard natural object benchmarks existing in the field, our results highlight the most important design choices when only category-labeled CAD models are available but classification needs to be done with real-world images. Our domain ada…
▽ More
In this paper, we systematically analyze unsupervised domain adaptation pipelines for object classification in a challenging industrial setting. In contrast to standard natural object benchmarks existing in the field, our results highlight the most important design choices when only category-labeled CAD models are available but classification needs to be done with real-world images. Our domain adaptation pipeline achieves SoTA performance on the VisDA benchmark, but more importantly, drastically improves recognition performance on our new open industrial dataset comprised of 102 mechanical parts. We conclude with a set of guidelines that are relevant for practitioners needing to apply state-of-the-art unsupervised domain adaptation in practice. Our code is available at https://github.com/dritter-bht/synthnet-transfer-learning.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Evaluating Zero-cost Active Learning for Object Detection
Authors:
Dominik Probst,
Hasnain Raza,
Erik Rodner
Abstract:
Object detection requires substantial labeling effort for learning robust models. Active learning can reduce this effort by intelligently selecting relevant examples to be annotated. However, selecting these examples properly without introducing a sampling bias with a negative impact on the generalization performance is not straightforward and most active learning techniques can not hold their pro…
▽ More
Object detection requires substantial labeling effort for learning robust models. Active learning can reduce this effort by intelligently selecting relevant examples to be annotated. However, selecting these examples properly without introducing a sampling bias with a negative impact on the generalization performance is not straightforward and most active learning techniques can not hold their promises on real-world benchmarks. In our evaluation paper, we focus on active learning techniques without a computational overhead besides inference, something we refer to as zero-cost active learning. In particular, we show that a key ingredient is not only the score on a bounding box level but also the technique used for aggregating the scores for ranking images. We outline our experimental setup and also discuss practical considerations when using active learning for object detection.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation
Authors:
Simon Reiß,
Constantin Seibold,
Alexander Freytag,
Erik Rodner,
Rainer Stiefelhagen
Abstract:
Pixel-wise segmentation is one of the most data and annotation hungry tasks in our field. Providing representative and accurate annotations is often mission-critical especially for challenging medical applications. In this paper, we propose a semi-weakly supervised segmentation algorithm to overcome this barrier. Our approach is based on a new formulation of deep supervision and student-teacher mo…
▽ More
Pixel-wise segmentation is one of the most data and annotation hungry tasks in our field. Providing representative and accurate annotations is often mission-critical especially for challenging medical applications. In this paper, we propose a semi-weakly supervised segmentation algorithm to overcome this barrier. Our approach is based on a new formulation of deep supervision and student-teacher model and allows for easy integration of different supervision signals. In contrast to previous work, we show that care has to be taken how deep supervision is integrated in lower layers and we present multi-label deep supervision as the most important secret ingredient for success. With our novel training regime for segmentation that flexibly makes use of images that are either fully labeled, marked with bounding boxes, just global labels, or not at all, we are able to cut the requirement for expensive labels by 94.22% - narrowing the gap to the best fully supervised baseline to only 5% mean IoU. Our approach is validated by extensive experiments on retinal fluid segmentation and we provide an in-depth analysis of the anticipated effect each annotation type can have in boosting segmentation performance.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection
Authors:
Björn Barz,
Erik Rodner,
Yanira Guanche Garcia,
Joachim Denzler
Abstract:
Automatic detection of anomalies in space- and time-varying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatio-temporal time-series, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition…
▽ More
Automatic detection of anomalies in space- and time-varying measurements is an important tool in several fields, e.g., fraud detection, climate analysis, or healthcare monitoring. We present an algorithm for detecting anomalous regions in multivariate spatio-temporal time-series, which allows for spotting the interesting parts in large amounts of data, including video and text data. In opposition to existing techniques for detecting isolated anomalous data points, we propose the "Maximally Divergent Intervals" (MDI) framework for unsupervised detection of coherent spatial regions and time intervals characterized by a high Kullback-Leibler divergence compared with all other data given. In this regard, we define an unbiased Kullback-Leibler divergence that allows for ranking regions of different size and show how to enable the algorithm to run on large-scale data sets in reasonable time using an interval proposal technique. Experiments on both synthetic and real data from various domains, such as climate analysis, video surveillance, and text forensics, demonstrate that our method is widely applicable and a valuable tool for finding interesting events in different types of data.
△ Less
Submitted 23 July, 2019; v1 submitted 19 April, 2018;
originally announced April 2018.
-
Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues
Authors:
Talha Qaiser,
Abhik Mukherjee,
Chaitanya Reddy Pb,
Sai Dileep Munugoti,
Vamsi Tallam,
Tomi Pitkäaho,
Taina Lehtimäki,
Thomas Naughton,
Matt Berseth,
Aníbal Pedraza,
Ramakrishnan Mukundan,
Matthew Smith,
Abhir Bhalerao,
Erik Rodner,
Marcel Simon,
Joachim Denzler,
Chao-Hui Huang,
Gloria Bueno,
David Snead,
Ian Ellis,
Mohammad Ilyas,
Nasir Rajpoot
Abstract:
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variabili…
▽ More
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.
△ Less
Submitted 24 July, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Generalized orderless pooling performs implicit salient matching
Authors:
Marcel Simon,
Yang Gao,
Trevor Darrell,
Joachim Denzler,
Erik Rodner
Abstract:
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualiz…
▽ More
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions made by these approaches. We identify parts of training images having the highest influence on the prediction of a given test image. It allows for justifying decisions to users and also for analyzing the influence of semantic parts. For example, we can show that the higher capacity VGG16 model focuses much more on the bird's head than, e.g., the lower-capacity VGG-M model when recognizing fine-grained bird categories. Both contributions allow us to analyze the difference when moving between average and bilinear pooling. In addition, experiments show that our generalized approach can outperform both across a variety of standard datasets.
△ Less
Submitted 20 July, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
Fast Learning and Prediction for Object Detection using Whitened CNN Features
Authors:
Björn Barz,
Erik Rodner,
Christoph Käding,
Joachim Denzler
Abstract:
We combine features extracted from pre-trained convolutional neural networks (CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of both: the high detection performance of CNNs, automatic feature engineering, fast model learning from few training samples and efficient sliding-window detection. The Adaptive Real-Time Object Detection System (ARTOS) has been refactored broadly…
▽ More
We combine features extracted from pre-trained convolutional neural networks (CNNs) with the fast, linear Exemplar-LDA classifier to get the advantages of both: the high detection performance of CNNs, automatic feature engineering, fast model learning from few training samples and efficient sliding-window detection. The Adaptive Real-Time Object Detection System (ARTOS) has been refactored broadly to be used in combination with Caffe for the experimental studies reported in this work.
△ Less
Submitted 12 April, 2017; v1 submitted 10 April, 2017;
originally announced April 2017.
-
Automatic Classification of Cancerous Tissue in Laserendomicroscopy Images of the Oral Cavity using Deep Learning
Authors:
Marc Aubreville,
Christian Knipfer,
Nicolai Oetter,
Christian Jaremenko,
Erik Rodner,
Joachim Denzler,
Christopher Bohr,
Helmut Neumann,
Florian Stelzle,
Andreas Maier
Abstract:
Oral Squamous Cell Carcinoma (OSCC) is a common type of cancer of the oral epithelium. Despite their high impact on mortality, sufficient screening methods for early diagnosis of OSCC often lack accuracy and thus OSCCs are mostly diagnosed at a late stage. Early detection and accurate outline estimation of OSCCs would lead to a better curative outcome and an reduction in recurrence rates after sur…
▽ More
Oral Squamous Cell Carcinoma (OSCC) is a common type of cancer of the oral epithelium. Despite their high impact on mortality, sufficient screening methods for early diagnosis of OSCC often lack accuracy and thus OSCCs are mostly diagnosed at a late stage. Early detection and accurate outline estimation of OSCCs would lead to a better curative outcome and an reduction in recurrence rates after surgical treatment.
Confocal Laser Endomicroscopy (CLE) records sub-surface micro-anatomical images for in vivo cell structure analysis. Recent CLE studies showed great prospects for a reliable, real-time ultrastructural imaging of OSCC in situ.
We present and evaluate a novel automatic approach for a highly accurate OSCC diagnosis using deep learning technologies on CLE images. The method is compared against textural feature-based machine learning approaches that represent the current state of the art.
For this work, CLE image sequences (7894 images) from patients diagnosed with OSCC were obtained from 4 specific locations in the oral cavity, including the OSCC lesion. The present approach is found to outperform the state of the art in CLE image recognition with an area under the curve (AUC) of 0.96 and a mean accuracy of 88.3% (sensitivity 86.6%, specificity 90%).
△ Less
Submitted 10 March, 2017; v1 submitted 5 March, 2017;
originally announced March 2017.
-
Active and Continuous Exploration with Deep Neural Networks and Expected Model Output Changes
Authors:
Christoph Käding,
Erik Rodner,
Alexander Freytag,
Joachim Denzler
Abstract:
The demands on visual recognition systems do not end with the complexity offered by current large-scale image datasets, such as ImageNet. In consequence, we need curious and continuously learning algorithms that actively acquire knowledge about semantic concepts which are present in available unlabeled data. As a step towards this goal, we show how to perform continuous active learning and explora…
▽ More
The demands on visual recognition systems do not end with the complexity offered by current large-scale image datasets, such as ImageNet. In consequence, we need curious and continuously learning algorithms that actively acquire knowledge about semantic concepts which are present in available unlabeled data. As a step towards this goal, we show how to perform continuous active learning and exploration, where an algorithm actively selects relevant batches of unlabeled examples for annotation. These examples could either belong to already known or to yet undiscovered classes. Our algorithm is based on a new generalization of the Expected Model Output Change principle for deep architectures and is especially tailored to deep neural networks. Furthermore, we show easy-to-implement approximations that yield efficient techniques for active selection. Empirical experiments show that our method outperforms currently used heuristics.
△ Less
Submitted 19 December, 2016;
originally announced December 2016.
-
ImageNet pre-trained models with batch normalization
Authors:
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pre-trained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models outp…
▽ More
Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pre-trained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models outperform previous models with the same architecture. The models and training code are available at http://www.inf-cv.uni-jena.de/Research/CNN+Models.html and https://github.com/cvjena/cnn-models
△ Less
Submitted 6 December, 2016; v1 submitted 5 December, 2016;
originally announced December 2016.
-
Maximally Divergent Intervals for Anomaly Detection
Authors:
Erik Rodner,
Björn Barz,
Yanira Guanche,
Milan Flach,
Miguel Mahecha,
Paul Bodesheim,
Markus Reichstein,
Joachim Denzler
Abstract:
We present new methods for batch anomaly detection in multivariate time series. Our methods are based on maximizing the Kullback-Leibler divergence between the data distribution within and outside an interval of the time series. An empirical analysis shows the benefits of our algorithms compared to methods that treat each time step independently from each other without optimizing with respect to a…
▽ More
We present new methods for batch anomaly detection in multivariate time series. Our methods are based on maximizing the Kullback-Leibler divergence between the data distribution within and outside an interval of the time series. An empirical analysis shows the benefits of our algorithms compared to methods that treat each time step independently from each other without optimizing with respect to all possible intervals.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches
Authors:
Erik Rodner,
Marcel Simon,
Robert B. Fisher,
Joachim Denzler
Abstract:
In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) how can we predict CNN sensitivity?; and (3) can we increase the robustness of CNNs with respect t…
▽ More
In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) how can we predict CNN sensitivity?; and (3) can we increase the robustness of CNNs with respect to image degradations? To answer the first question, we provide an extensive empirical sensitivity analysis of commonly used CNN architectures (AlexNet, VGG19, GoogleNet) across various types of image degradations. This allows for predicting CNN performance for new domains comprised by images of lower quality or captured from a different viewpoint. We also show how the sensitivity of CNN outputs can be predicted for single images. Furthermore, we demonstrate that input layer dropout or pre-filtering during test time only reduces CNN sensitivity for high levels of degradation.
Experiments for fine-grained recognition tasks reveal that VGG19 is more robust to severe image degradations than AlexNet and GoogleNet. However, small intensity noise can lead to dramatic changes in CNN performance even for VGG19.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets
Authors:
Manuel Amthor,
Erik Rodner,
Joachim Denzler
Abstract:
We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time budgets during application. They allow for individual budgets given a priori for each test example and for anytime prediction, i.e., a possible interruption at multiple stages during inference while still providing output estimates. Our approach can therefore tackle the computational costs and energy demands of DNNs in a…
▽ More
We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time budgets during application. They allow for individual budgets given a priori for each test example and for anytime prediction, i.e., a possible interruption at multiple stages during inference while still providing output estimates. Our approach can therefore tackle the computational costs and energy demands of DNNs in an adaptive manner, a property essential for real-time applications. Our Impatient DNNs are based on a new general framework of learning dynamic budget predictors using risk minimization, which can be applied to current DNN architectures by adding early prediction and additional loss layers. A key aspect of our method is that all of the intermediate predictors are learned jointly. In experiments, we evaluate our approach for different budget distributions, architectures, and datasets. Our results show a significant gain in expected accuracy compared to common baselines.
△ Less
Submitted 10 October, 2016;
originally announced October 2016.
-
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks
Authors:
Clemens-Alexander Brust,
Sven Sickert,
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp duri…
▽ More
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp during learning and testing of fully convolutional networks for the task of semantic segmentation. We compare QuickProp empirically with gradient descent, which is the current standard method. Experiments suggest that QuickProp can not compete with standard gradient descent techniques for complex computer vision tasks like semantic segmentation.
△ Less
Submitted 15 June, 2016; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Fine-grained Recognition Datasets for Biodiversity Analysis
Authors:
Erik Rodner,
Marcel Simon,
Gunnar Brehm,
Stephanie Pietsch,
J. Wolfgang Wägele,
Joachim Denzler
Abstract:
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude…
▽ More
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude with a list of challenging new research directions in the area of visual classification for biodiversity research.
△ Less
Submitted 3 July, 2015;
originally announced July 2015.
-
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
Authors:
Marcel Simon,
Erik Rodner
Abstract:
Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellation…
▽ More
Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellations of neural activation patterns computed using convolutional neural networks. In our experiments, we outperform existing approaches for fine-grained recognition on the CUB200-2011, NA birds, Oxford PETS, and Oxford Flowers dataset in case no part or bounding box annotations are available and achieve state-of-the-art performance for the Stanford Dog dataset. We also show the benefits of neural constellation models as a data augmentation technique for fine-tuning. Furthermore, our paper unites the areas of generic and fine-grained classification, since our approach is suitable for both scenarios. The source code of our method is available online at http://www.inf-cv.uni-jena.de/part_discovery
△ Less
Submitted 5 December, 2015; v1 submitted 30 April, 2015;
originally announced April 2015.
-
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding
Authors:
Clemens-Alexander Brust,
Sven Sickert,
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network,…
▽ More
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset.
Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
△ Less
Submitted 23 February, 2015;
originally announced February 2015.
-
Part Detector Discovery in Deep Convolutional Neural Networks
Authors:
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery…
▽ More
Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called "part detector discovery" (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes.
This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training. The code is available at http://www.inf-cv.uni-jena.de/part_discovery and https://github.com/cvjena/PartDetectorDisovery.
△ Less
Submitted 14 November, 2014; v1 submitted 12 November, 2014;
originally announced November 2014.
-
Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods
Authors:
Alexander Freytag,
Johannes Rühle,
Paul Bodesheim,
Erik Rodner,
Joachim Denzler
Abstract:
Vector-quantized local features frequently used in bag-of-visual-words approaches are the backbone of popular visual recognition systems due to both their simplicity and their performance. Despite their success, bag-of-words-histograms basically contain low-level image statistics (e.g., number of edges of different orientations). The question remains how much visual information is "lost in quantiz…
▽ More
Vector-quantized local features frequently used in bag-of-visual-words approaches are the backbone of popular visual recognition systems due to both their simplicity and their performance. Despite their success, bag-of-words-histograms basically contain low-level image statistics (e.g., number of edges of different orientations). The question remains how much visual information is "lost in quantization" when map** visual features to code words? To answer this question, we present an in-depth analysis of the effect of local feature quantization on human recognition performance. Our analysis is based on recovering the visual information by inverting quantized local features and presenting these visualizations with different codebook sizes to human observers. Although feature inversion techniques are around for quite a while, to the best of our knowledge, our technique is the first visualizing especially the effect of feature quantization. Thereby, we are now able to compare single steps in common image classification pipelines to human counterparts.
△ Less
Submitted 20 August, 2014;
originally announced August 2014.
-
ARTOS -- Adaptive Real-Time Object Detection System
Authors:
Björn Barz,
Erik Rodner,
Joachim Denzler
Abstract:
ARTOS is all about creating, tuning, and applying object detection models with just a few clicks. In particular, ARTOS facilitates learning of models for visual object detection by eliminating the burden of having to collect and annotate a large set of positive and negative samples manually and in addition it implements a fast learning technique to reduce the time needed for the learning step.
A…
▽ More
ARTOS is all about creating, tuning, and applying object detection models with just a few clicks. In particular, ARTOS facilitates learning of models for visual object detection by eliminating the burden of having to collect and annotate a large set of positive and negative samples manually and in addition it implements a fast learning technique to reduce the time needed for the learning step.
A clean and friendly GUI guides the user through the process of model creation, adaptation of learned models to different domains using in-situ images, and object detection on both offline images and images from a video stream. A library written in C++ provides the main functionality of ARTOS with a C-style procedural interface, so that it can be easily integrated with any other project.
△ Less
Submitted 25 August, 2014; v1 submitted 10 July, 2014;
originally announced July 2014.
-
Fine-grained Categorization -- Short Summary of our Entry for the ImageNet Challenge 2012
Authors:
Christoph Göring,
Alexander Freytag,
Erik Rodner,
Joachim Denzler
Abstract:
In this paper, we tackle the problem of visual categorization of dog breeds, which is a surprisingly challenging task due to simultaneously present low interclass distances and high intra-class variances. Our approach combines several techniques well known in our community but often not utilized for fine-grained recognition:
(1) automatic segmentation, (2) efficient part detection, and (3) combi…
▽ More
In this paper, we tackle the problem of visual categorization of dog breeds, which is a surprisingly challenging task due to simultaneously present low interclass distances and high intra-class variances. Our approach combines several techniques well known in our community but often not utilized for fine-grained recognition:
(1) automatic segmentation, (2) efficient part detection, and (3) combination of multiple features. In particular, we demonstrate that a simple head detector embedded in an off-the-shelf recognition pipeline can improve recognition accuracy quite significantly, highlighting the importance of part features for fine-grained recognition tasks. Using our approach, we achieved a 24.59% mean average precision performance on the Stanford dog dataset.
△ Less
Submitted 17 October, 2013;
originally announced October 2013.
-
Towards Adapting ImageNet to Reality: Scalable Domain Adaptation with Implicit Low-rank Transformations
Authors:
Erik Rodner,
Judy Hoffman,
Jeff Donahue,
Trevor Darrell,
Kate Saenko
Abstract:
Images seen during test time are often not from the same distribution as images used for learning. This problem, known as domain shift, occurs when training classifiers from object-centric internet image databases and trying to apply them directly to scene understanding tasks. The consequence is often severe performance degradation and is one of the major barriers for the application of classifier…
▽ More
Images seen during test time are often not from the same distribution as images used for learning. This problem, known as domain shift, occurs when training classifiers from object-centric internet image databases and trying to apply them directly to scene understanding tasks. The consequence is often severe performance degradation and is one of the major barriers for the application of classifiers in real-world systems. In this paper, we show how to learn transform-based domain adaptation classifiers in a scalable manner. The key idea is to exploit an implicit rank constraint, originated from a max-margin domain adaptation formulation, to make optimization tractable. Experiments show that the transformation between domains can be very efficiently learned from data and easily applied to new categories. This begins to bridge the gap between large-scale internet image collections and object images captured in everyday life environments.
△ Less
Submitted 19 August, 2013;
originally announced August 2013.
-
Efficient Learning of Domain-invariant Image Representations
Authors:
Judy Hoffman,
Erik Rodner,
Jeff Donahue,
Trevor Darrell,
Kate Saenko
Abstract:
We present an algorithm that learns representations which explicitly compensate for domain mismatch and which can be efficiently realized as linear classifiers. Specifically, we form a linear transformation that maps features from the target (test) domain to the source (training) domain as part of training the classifier. We optimize both the transformation and classifier parameters jointly, and i…
▽ More
We present an algorithm that learns representations which explicitly compensate for domain mismatch and which can be efficiently realized as linear classifiers. Specifically, we form a linear transformation that maps features from the target (test) domain to the source (training) domain as part of training the classifier. We optimize both the transformation and classifier parameters jointly, and introduce an efficient cost function based on misclassification loss. Our method combines several features previously unavailable in a single algorithm: multi-class adaptation through representation learning, ability to map across heterogeneous feature spaces, and scalability to large datasets. We present experiments on several image datasets that demonstrate improved accuracy and computational advantages compared to previous approaches.
△ Less
Submitted 8 April, 2013; v1 submitted 14 January, 2013;
originally announced January 2013.
-
Visual Transfer Learning: Informal Introduction and Literature Overview
Authors:
Erik Rodner
Abstract:
Transfer learning techniques are important to handle small training sets and to allow for quick generalization even from only a few examples. The following paper is the introduction as well as the literature overview part of my thesis related to the topic of transfer learning for visual recognition problems.
Transfer learning techniques are important to handle small training sets and to allow for quick generalization even from only a few examples. The following paper is the introduction as well as the literature overview part of my thesis related to the topic of transfer learning for visual recognition problems.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.