Search | arXiv e-print repository

doi 10.1007/s10664-024-10440-0

ROBUST: 221 Bugs in the Robot Operating System

Authors: Christopher S. Timperley, Gijs van der Hoorn, André Santos, Harshavardhan Deshpande, Andrzej Wąsowski

Abstract: As robotic systems such as autonomous cars and delivery drones assume greater roles and responsibilities within society, the likelihood and impact of catastrophic software failure within those systems is increased.To aid researchers in the development of new methods to measure and assure the safety and quality of robotics software, we systematically curated a dataset of 221 bugs across 7 popular a… ▽ More As robotic systems such as autonomous cars and delivery drones assume greater roles and responsibilities within society, the likelihood and impact of catastrophic software failure within those systems is increased.To aid researchers in the development of new methods to measure and assure the safety and quality of robotics software, we systematically curated a dataset of 221 bugs across 7 popular and diverse software systems implemented via the Robot Operating System (ROS). We produce historically accurate recreations of each of the 221 defective software versions in the form of Docker images, and use a grounded theory approach to examine and categorize their corresponding faults, failures, and fixes. Finally, we reflect on the implications of our findings and outline future research directions for the community. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Journal ref: ROBUST: 221 bugs in the Robot Operating System CS Timperley, G van der Hoorn, A Santos, H Deshpande, A Wąsowski Empirical Software Engineering 29 (3), 57, 2024

arXiv:2403.12029 [pdf, other]

Align and Distill: Unifying and Improving Domain Adaptive Object Detection

Authors: Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, Grant Van Horn

Abstract: Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inc… ▽ More Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +2.0 AP50 on CFC Kenai to Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 30 pages, 10 figures

arXiv:2401.02460 [pdf, other]

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Authors: Oindrila Saha, Grant Van Horn, Subhransu Maji

Abstract: The zero-shot performance of existing vision-language models (VLMs) such as CLIP is limited by the availability of large-scale, aligned image and text datasets in specific domains. In this work, we leverage two complementary sources of information -- descriptions of categories generated by large language models (LLMs) and abundant, fine-grained image classification datasets -- to improve the zero-… ▽ More The zero-shot performance of existing vision-language models (VLMs) such as CLIP is limited by the availability of large-scale, aligned image and text datasets in specific domains. In this work, we leverage two complementary sources of information -- descriptions of categories generated by large language models (LLMs) and abundant, fine-grained image classification datasets -- to improve the zero-shot classification performance of VLMs across fine-grained domains. On the technical side, we develop methods to train VLMs with this "bag-level" image-text supervision. We find that simply using these attributes at test-time does not improve performance, but our training strategy, for example, on the iNaturalist dataset, leads to an average improvement of 4-5% in zero-shot classification accuracy for novel categories of birds and flowers. Similar improvements are observed in domains where a subset of the categories was used to fine-tune the model. By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories. We systematically evaluate their ability to improve zero-shot categorization in natural domains. Our findings suggest that geographic priors can be just as effective and are complementary to visual appearance. Our method also outperforms prior work on prompt-based tuning of VLMs. We release the benchmark, consisting of 14 datasets at https://github.com/cvl-umass/AdaptCLIPZS , which will contribute to future research in zero-shot recognition. △ Less

Submitted 3 April, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.05287 [pdf, other]

Human in-the-Loop Estimation of Cluster Count in Datasets via Similarity-Driven Nested Importance Sampling

Authors: Gustavo Perez, Daniel Sheldon, Grant Van Horn, Subhransu Maji

Abstract: Identifying the number of clusters serves as a preliminary goal for many data analysis tasks. A common approach to this problem is to vary the number of clusters in a clustering algorithm (e.g., 'k' in $k$-means) and pick the value that best explains the data. However, the count estimates can be unreliable especially when the image similarity is poor. Human feedback on the pairwise similarity can… ▽ More Identifying the number of clusters serves as a preliminary goal for many data analysis tasks. A common approach to this problem is to vary the number of clusters in a clustering algorithm (e.g., 'k' in $k$-means) and pick the value that best explains the data. However, the count estimates can be unreliable especially when the image similarity is poor. Human feedback on the pairwise similarity can be used to improve the clustering, but existing approaches do not guarantee accurate count estimates. We propose an approach to produce estimates of the cluster counts in a large dataset given an approximate pairwise similarity. Our framework samples edges guided by the pairwise similarity, and we collect human feedback to construct a statistical estimate of the cluster count. On the technical front we have developed a nested importance sampling approach that yields (asymptotically) unbiased estimates of the cluster count with confidence intervals which can guide human effort. Compared to naive sampling, our similarity-driven sampling produces more accurate estimates of counts and tighter confidence intervals. We evaluate our method on a benchmark of six fine-grained image classification datasets achieving low error rates on the estimated number of clusters with significantly less human labeling effort compared to baselines and alternative active clustering approaches. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.14328 [pdf, other]

Offline Skill Generalization via Task and Motion Planning

Authors: Shin Watanabe, Geir Horn, Jim Tørresen, Kai Olav Ellefsen

Abstract: This paper presents a novel approach to generalizing robot manipulation skills by combining a sampling-based task-and-motion planner with an offline reinforcement learning algorithm. Starting with a small library of scripted primitive skills (e.g. Push) and object-centric symbolic predicates (e.g. On(block, plate)), the planner autonomously generates a demonstration dataset of manipulation skills… ▽ More This paper presents a novel approach to generalizing robot manipulation skills by combining a sampling-based task-and-motion planner with an offline reinforcement learning algorithm. Starting with a small library of scripted primitive skills (e.g. Push) and object-centric symbolic predicates (e.g. On(block, plate)), the planner autonomously generates a demonstration dataset of manipulation skills in the context of a long-horizon task. An offline reinforcement learning algorithm then extracts a policy from the dataset without further interactions with the environment and replaces the scripted skill in the existing library. Refining the skill library improves the robustness of the planner, which in turn facilitates data collection for more complex manipulation skills. We validate our approach in simulation, on a block-pushing task. We show that the proposed method requires less training data than conventional reinforcement learning methods. Furthermore, interaction with the environment is collision-free because of the use of planner demonstrations, making the approach more amenable to persistent robot learning in the real world. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures

arXiv:2311.02061 [pdf, other]

Active Learning-Based Species Range Estimation

Authors: Christian Lange, Elijah Cole, Grant Van Horn, Oisin Mac Aodha

Abstract: We propose a new active learning approach for efficiently estimating the geographic range of a species from a limited number of on the ground observations. We model the range of an unmapped species of interest as the weighted combination of estimated ranges obtained from a set of different species. We show that it is possible to generate this candidate set of ranges by using models that have been… ▽ More We propose a new active learning approach for efficiently estimating the geographic range of a species from a limited number of on the ground observations. We model the range of an unmapped species of interest as the weighted combination of estimated ranges obtained from a set of different species. We show that it is possible to generate this candidate set of ranges by using models that have been trained on large weakly supervised community collected observation data. From this, we develop a new active querying approach that sequentially selects geographic locations to visit that best reduce our uncertainty over an unmapped species' range. We conduct a detailed evaluation of our approach and compare it to existing active learning methods using an evaluation dataset containing expert-derived ranges for one thousand species. Our results demonstrate that our method outperforms alternative active learning methods and approaches the performance of end-to-end trained models, even when only using a fraction of the data. This highlights the utility of active learning via transfer learned spatial representations for species range estimation. It also emphasizes the value of leveraging emerging large-scale crowdsourced datasets, not only for modeling a species' range, but also for actively discovering them. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023

arXiv:2306.02564 [pdf, other]

Spatial Implicit Neural Representations for Global-Scale Species Map**

Authors: Elijah Cole, Grant Van Horn, Christian Lange, Alexander Shepard, Patrick Leary, Pietro Perona, Scott Loarie, Oisin Mac Aodha

Abstract: Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging l… ▽ More Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets which can include tens of millions of records for hundreds of thousands of species. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. We find that our approach scales gracefully, making increasingly better predictions as we increase the number of species and the amount of data per species when training. To make this problem accessible to machine learning researchers, we provide four new benchmarks that measure different aspects of species range estimation and spatial representation learning. Using these benchmarks, we demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2207.10664 [pdf, other]

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Authors: Grant Van Horn, Rui Qian, Kimberly Wilber, Hartwig Adam, Oisin Mac Aodha, Serge Belongie

Abstract: We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorization on images, the counterparts in audio and video fine-grained categorization are relatively unexplored. To encourage advancements in this space, we have carefully constructed the SSW60 datas… ▽ More We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorization on images, the counterparts in audio and video fine-grained categorization are relatively unexplored. To encourage advancements in this space, we have carefully constructed the SSW60 dataset to enable researchers to experiment with classifying the same set of categories in three different modalities: images, audio, and video. The dataset covers 60 species of birds and is comprised of images from existing datasets, and brand new, expert-curated audio and video datasets. We thoroughly benchmark audiovisual classification performance and modality fusion experiments through the use of state-of-the-art transformer methods. Our findings show that performance of audiovisual fusion methods is better than using exclusively image or audio based methods for the task of video classification. We also present interesting modality transfer experiments, enabled by the unique construction of SSW60 to encompass three different modalities. We hope the SSW60 dataset and accompanying baselines spur research in this fascinating area. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: ECCV 2022 Camera Ready

arXiv:2207.10225 [pdf, other]

On Label Granularity and Object Localization

Authors: Elijah Cole, Kimberly Wilber, Grant Van Horn, Xuan Yang, Marco Fornoni, Pietro Perona, Serge Belongie, Andrew Howard, Oisin Mac Aodha

Abstract: Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation w… ▽ More Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.09295 [pdf, other]

The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting

Authors: Justin Kay, Peter Kulits, Suzanne Stathatos, Siqi Deng, Erik Young, Sara Beery, Grant Van Horn, Pietro Perona

Abstract: We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely… ▽ More We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: ECCV 2022. 33 pages, 12 figures

arXiv:2110.12951 [pdf, other]

doi 10.1038/s41467-022-27980-y

Seeing biodiversity: perspectives in machine learning for wildlife conservation

Authors: Devis Tuia, Benjamin Kellenberger, Sara Beery, Blair R. Costelloe, Silvia Zuffi, Benjamin Risse, Alexander Mathis, Mackenzie W. Mathis, Frank van Langevelde, Tilo Burghardt, Roland Kays, Holger Klinck, Martin Wikelski, Iain D. Couzin, Grant van Horn, Margaret C. Crofoot, Charles V. Stewart, Tanya Berger-Wolf

Abstract: Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how the… ▽ More Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how they ingest, digest, and distill data into relevant information. We argue that machine learning, and especially deep learning approaches, can meet this analytic challenge to enhance our understanding, monitoring capacity, and conservation of wildlife species. Incorporating machine learning into ecological workflows could improve inputs for population and behavior models and eventually lead to integrated hybrid modeling tools, with ecological models acting as constraints for machine learning models and the latter providing data-supported insights. In essence, by combining new machine learning approaches with ecological domain knowledge, animal ecologists can capitalize on the abundance of data generated by modern sensor technologies in order to reliably estimate population abundances, study animal behavior and mitigate human/wildlife conflicts. To succeed, this approach will require close collaboration and cross-disciplinary education between the computer science and animal ecology communities in order to ensure the quality of machine learning approaches and train a new generation of data scientists in ecology and conservation. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2103.16483 [pdf, other]

Benchmarking Representation Learning for Natural World Image Collections

Authors: Grant Van Horn, Elijah Cole, Sara Beery, Kimberly Wilber, Serge Belongie, Oisin Mac Aodha

Abstract: Recent progress in self-supervised learning has resulted in models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority of these approaches have restricted themselves to training on standard benchmark datasets such as ImageNet. We argue that fine-grained visual categorization problems, such a… ▽ More Recent progress in self-supervised learning has resulted in models that are capable of extracting rich representations from image collections without requiring any explicit label supervision. However, to date the vast majority of these approaches have restricted themselves to training on standard benchmark datasets such as ImageNet. We argue that fine-grained visual categorization problems, such as plant and animal species classification, provide an informative testbed for self-supervised learning. In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT. The former consists of 2.7M images from 10k different species uploaded by users of the citizen science application iNaturalist. We designed the latter, NeWT, in collaboration with domain experts with the aim of benchmarking the performance of representation learning algorithms on a suite of challenging natural world binary classification tasks that go beyond standard species classification. These two new datasets allow us to explore questions related to large-scale representation and transfer learning in the context of fine-grained categories. We provide a comprehensive analysis of feature extractors trained with and without supervision on ImageNet and iNat2021, shedding light on the strengths and weaknesses of different learned features across a diverse set of tasks. We find that features produced by standard supervised methods still outperform those produced by self-supervised approaches such as SimCLR. However, improved self-supervised learning methods are constantly being released and the iNat2021 and NeWT datasets are a valuable resource for tracking their progress. △ Less

Submitted 8 June, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2010.09145 [pdf, other]

MROS: Runtime Adaptation For Robot Control Architectures

Authors: Darko Bozhinoski, Carlos Hernandez Corbato, Mario Garzon Oviedo, Gijs van der Hoorn, Nadia Hammoudeh Garcia, Harshavardhan Deshpande, Jon Tjerngren, Andrzej Wasowski

Abstract: Known attempts to build autonomous robots rely on complex control architectures, often implemented with the Robot Operating System platform (ROS). Runtime adaptation is needed in these systems, to cope with component failures and with contingencies arising from dynamic environments-otherwise, these affect the reliability and quality of the mission execution. Existing proposals on how to build self… ▽ More Known attempts to build autonomous robots rely on complex control architectures, often implemented with the Robot Operating System platform (ROS). Runtime adaptation is needed in these systems, to cope with component failures and with contingencies arising from dynamic environments-otherwise, these affect the reliability and quality of the mission execution. Existing proposals on how to build self-adaptive systems in robotics usually require a major re-design of the control architecture and rely on complex tools unfamiliar to the robotics community. Moreover, they are hard to reuse across applications. This paper presents MROS: a model-based framework for run-time adaptation of robot control architectures based on ROS. MROS uses a combination of domain-specific languages to model architectural variants and captures mission quality concerns, and an ontology-based implementation of the MAPE-K and meta-control visions for run-time adaptation. The experiment results obtained applying MROS in two realistic ROS-based robotic demonstrators show the benefits of our approach in terms of the quality of the mission execution, and MROS' extensibility and re-usability across robotic applications. △ Less

Submitted 23 November, 2021; v1 submitted 18 October, 2020; originally announced October 2020.

arXiv:1904.05986 [pdf, other]

The iWildCam 2018 Challenge Dataset

Authors: Sara Beery, Grant van Horn, Oisin Mac Aodha, Pietro Perona

Abstract: Camera traps are a valuable tool for studying biodiversity, but research using this data is limited by the speed of human annotation. With the vast amounts of data now available it is imperative that we develop automatic solutions for annotating camera trap data in order to allow this research to scale. A promising approach is based on deep networks trained on human-annotated images. We provide a… ▽ More Camera traps are a valuable tool for studying biodiversity, but research using this data is limited by the speed of human annotation. With the vast amounts of data now available it is imperative that we develop automatic solutions for annotating camera trap data in order to allow this research to scale. A promising approach is based on deep networks trained on human-annotated images. We provide a challenge dataset to explore whether such solutions generalize to novel locations, since systems that are trained once and may be deployed to operate automatically in new locations would be most useful. △ Less

Submitted 24 April, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: Challenge hosted at the fifth Fine-Grained Visual Categorization Workshop (FGVC5) at CVPR 2018

arXiv:1807.04975 [pdf, other]

Recognition in Terra Incognita

Authors: Sara Beery, Grant van Horn, Pietro Perona

Abstract: It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera trap… ▽ More It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems. △ Less

Submitted 24 July, 2018; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: Accepted to ECCV 2018

arXiv:1709.01450 [pdf, other]

The Devil is in the Tails: Fine-grained Classification in the Wild

Authors: Grant Van Horn, Pietro Perona

Abstract: The world is long-tailed. What does this mean for computer vision and visual recognition? The main two implications are (1) the number of categories we need to consider in applications can be very large, and (2) the number of training examples for most categories can be very small. Current visual recognition algorithms have achieved excellent classification accuracy. However, they require many tra… ▽ More The world is long-tailed. What does this mean for computer vision and visual recognition? The main two implications are (1) the number of categories we need to consider in applications can be very large, and (2) the number of training examples for most categories can be very small. Current visual recognition algorithms have achieved excellent classification accuracy. However, they require many training examples to reach peak performance, which suggests that long-tailed distributions will not be dealt with well. We analyze this question in the context of eBird, a large fine-grained classification dataset, and a state-of-the-art deep network classification algorithm. We find that (a) peak classification performance on well-represented categories is excellent, (b) given enough data, classification performance suffers only minimally from an increase in the number of classes, (c) classification performance decays precipitously as the number of training examples decreases, (d) surprisingly, transfer learning is virtually absent in current methods. Our findings suggest that our community should come to grips with the question of long tails. △ Less

Submitted 5 September, 2017; originally announced September 2017.

arXiv:1707.06642 [pdf, other]

The iNaturalist Species Classification and Detection Dataset

Authors: Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie

Abstract: Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset,… ▽ More Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier to photograph than others. To encourage further progress in challenging real world conditions we present the iNaturalist species classification and detection dataset, consisting of 859,000 images from over 5,000 different species of plants and animals. It features visually similar species, captured in a wide variety of situations, from all over the world. Images were collected with different camera types, have varying image quality, feature a large class imbalance, and have been verified by multiple citizen scientists. We discuss the collection of the dataset and present extensive baseline experiments using state-of-the-art computer vision classification and detection models. Results show that current non-ensemble based methods achieve only 67% top one classification accuracy, illustrating the difficulty of the dataset. Specifically, we observe poor results for classes with small numbers of training examples suggesting more attention is needed in low-shot learning. △ Less

Submitted 10 April, 2018; v1 submitted 20 July, 2017; originally announced July 2017.

Comments: CVPR 2018

arXiv:1508.05494 [pdf, other]

doi 10.1002/zamm.201500180

A quaternion-based model for optimal control of the SkySails airborne wind energy system

Authors: Michael Erhard, Greg Horn, Moritz Diehl

Abstract: Airborne wind energy systems are capable of extracting energy from higher wind speeds at higher altitudes. The configuration considered in this paper is based on a tethered kite flown in a pum** orbit. This pum** cycle generates energy by winching out at high tether forces and driving a generator while flying figures-of-eight, or lemniscates, as crosswind pattern. Then, the tether is reeled in… ▽ More Airborne wind energy systems are capable of extracting energy from higher wind speeds at higher altitudes. The configuration considered in this paper is based on a tethered kite flown in a pum** orbit. This pum** cycle generates energy by winching out at high tether forces and driving a generator while flying figures-of-eight, or lemniscates, as crosswind pattern. Then, the tether is reeled in while kee** the kite at a neutral position, thus leaving a net amount of generated energy. In order to achieve an economic operation, optimization of pum** cycles is of great interest. In this paper, first the principles of airborne wind energy will be briefly revisited. The first contribution is a singularity-free model for the tethered kite dynamics in quaternion representation, where the model is derived from first principles. The second contribution is an optimal control formulation and numerical results for complete pum** cycles. Based on the developed model, the setup of the optimal control problem (OCP) is described in detail along with its numerical solution based on the direct multiple shooting method in the CasADi optimization environment. Optimization results for a pum** cycle consisting of six lemniscates show that the approach is capable to find an optimal orbit in a few minutes of computation time. For this optimal orbit, the power output is increased by a factor of two compared to a sophisticated initial guess for the considered test scenario. △ Less

Submitted 22 August, 2015; originally announced August 2015.

Comments: 18 pages, 14 figures, submitted to Journal of Applied Mathematics and Mechanics (ZAMM)

MSC Class: 49N90

arXiv:1406.2952 [pdf, other]

Bird Species Categorization Using Pose Normalized Deep Convolutional Nets

Authors: Steve Branson, Grant Van Horn, Serge Belongie, Pietro Perona

Abstract: We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object's pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located an… ▽ More We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object's pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric war** functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of state-of-the-art deep convolutional feature implementations and fine-tuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%). △ Less

Submitted 11 June, 2014; originally announced June 2014.

Showing 1–19 of 19 results for author: Horn, G